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(57) Abstract 

The present invention provides recombinant DNA comprising a transcription promoter and a downstream sequence to be expressed, 
in operable linkage therewith, wherein the transcription promoter comprises a region found upstream of the open reading frame of a 
highly expressed Phaffia gene, preferably a glycolytic pathway gene, more preferably the gene coding for Glyceraldehydc-3-Phosphate 
Dehydrogenase. Further preferred recombinant DNAs according to the invention contain promoters of ribosomal protein encoding genes, 
more preferably wherein the transcription promoter comprises a region found upstream of the open reading frame encoding a protein as 
represented by one of the amino acid sequences depicted In any one of SEQIDNOs: 24 to 50. According to a further aspect of the invention 
an isolated DNA sequence coding for an enzyme involved in the carolenoid biosynthelic pathway of Phaffia rhodozyma is provided, 
preferably wherein said enzyme has an activity selected from isopentenyl pyrophosphate isomerase activity, geranylgeranyl pyrophosphate 
synthase activity, phytoenc synthase activity, phytocnc desaturase activity and lycopcnc cyclase activity, still more preferably those coding 
for an enzyme having an amino acid sequence selected from the one represented by SEQIDNO: 13. SEQIDNO: 15. SEQIDNO: 17. 
SEQIDNO: 19. SEQIDNO: 21 or SEQIDNO: 23. Further embodiments concern vectors, transformed host organisms, methods for making 
proteins and/or carotenoids, such as astaxanthin. and methods for isolating highly expressed promoters from Phaffia, 
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Improved methods for transforming Phaffia strains, transformed Phqffia 
strains so obtained and recombinant DNA in said methods 

Technical field 

The present invention relates to methods for transforming Phqffia yeast, transformed Phaffia 
strains, as well as recombinant DNA for use therein. 

Background of the invention 

Methods for transforming the yeast Phaffia rhodozyma have been disclosed in European patent 
application 0 590 707 A 1 . These methods involve incubation of protoplasts with DNA or incubation of 
Phqffia cells with DNA followed by lithium acetate treatment. The recombinant DNA used to transform 
Phaffia strains with either of these methods comprised a Phaffia actin gene promoter to drive expression 
of the selectable marker genes coding for resistance against G418 or phleomycin. The methods involve 
long PEG and lithium acetate incubation times and transformation frequencies are low. When protoplasts 
are used, the transformation frequency is dependent on the quality of the protoplast suspension, making 
the procedure less reliable. 

Recently a method for transforming Phaffia strains has been reported by Adrio J.L. and Veiga 
M.(July 1995, Biotechnology Techniques Vol. 9. No, 7, pp. 509-512). With this method the 
transformation frequencies are in the range of 3 to 13 transformants per |ig DNA, which is low. A 
further disadvantage of the method disclosed by these authors consists in increased doubling time of the 
transformed cells. The authors hypothesised that this may be due to interference of the autonomously 
replicating vector with chromosome replication. 

Clearly, there is still a need for a reliable and efficient method of transforming Phaffia strains 
with foreign DNA. It is an objective of the present invention to provide methods and means to achieve 
this. It is a further objective of the invention to optimize expression of certain genes in Phaffia 
rhodozyma in order to make Phajfia a more suitable production host for certain valuable compounds. 

Summary of the invention 
The invention provides a method for obtaining a transformed Phaffia strain, comprising the 
steps of contacting cells or protoplasts of a Phaffia strain with recombinant DNA under conditions 
conducive to uptake thereof, said recombinant DNA comprising a traiiscription promoter and a 
downstream sequence to be expressed which is heterologous to said transcription promoter, in operable 
linkage therewith, identifying Phaffia rhodozyma cells or protoplasts having obtained the said 
recombinant DNA in expressible form, wherein the transcription promoter comprises a region that is 
found upstream of the open reading frame of a highly expressed Phaffia gene. According to a prefened 
embodiment of the invention said highly expressed Phqffia gene is a glycolytic pathway gene, more 
preferably the glycolytic pathway gene is coding for Glyceraldehyde-3-Phosphate Dehydrogenase 



SUBSTITUTE SHEET (RULE 26) 



wo 97/23633 PCT/EP96/05887 

2 

(GAPDH).According lo one aspect of the invemion. said heterologous downstream sequence comprises 
an open reading frame coding for resistance against a selective agent, such as G418 or phleomycin. 

Another preferred method according to the invention is one, wherein said recombinant DNA 
comprises further a transcription terminator downstream from the said DNA to be expressed, in operable 
linkage therewith, which transcription temiinator comprises a region found downstream of the open 
reading frame of a Phqffia gene. It is still further preferred, that the recombinant DNA is in the form of 
linear DNA. 

Another preferred embodiment comprises, in addition to the steps above, the step of providing 
an electropulse after contacting of Phqffia cells or protoplasts with DNA. 

According to another embodiment the invention provides a transformed Phqffia strain capable 
of high-level expression of a heterologous DNA sequence, which strain is obtainable by a method 
according to the invention. Preferably, said Phaffia strain contains at least 10 copies of the said 
recombinant DNA integrated into its genome, such as a chromosome, particularly in the ribosomal DNA 
locus of said chromosome. 

The invention also provides recombinant DNA comprising a transcription promoter and a 
heterologous downstream sequence to be expressed, in operable linkage therewith, wherein the 
transcription promoter comprises a region found upstream of the open reading frame of a highly 
expressed Phqffia gene, preferably a glycolytic pathway gene, more preferably a gene coding for 
Glyceraldehyde-3-Phosphate Dehydrogenase. 

Also provided is recombinant DNA according to the invention, wherein the heterologous 
downstream sequence comprises an open reading frame coding for reduced sensitivity against a selective 
agent, preferably 0418 or phleomycin. Said recombinant DNA preferably comprises further a 
transcription terminator downstream from the said heterologous DNA sequence to be expressed, in 
operable linkage therewith. 

Further aspects of the invention concern a microorganism harbouring recombinant DNA 
according to the invention, preferably Phaffia strains, more preferably Phaffia rhodozyma su^ins, as well 
as cultures thereof. 

According to still other preferred embodiments isolated DNA fragments are provided 
comprising a Phaffia GAPDH-gene, or a fragment thereof, as well as the use of such a fragment for 
making a recombinant DNA construct. According to one embodiment of this aspect said fragment is a 
regulatory region located upstream or downstream of the open reading frame coding for GAPDH, and it 
is used in conjunction with a heterologous sequence to be expressed under the control thereof 

The invention according to yet another aspect, provides a method for producing a protein or a 
pigment by cuhuring a Phaffia strain under conditions conducive to the production of said protein or 
pigment, wherein the Phaffia strain is a transformed Phaffia strain according to the invention. 

According to another aspect of the invention, a method for obtaining a transformed Phqffta 
strain, comprising the steps of 

contacting cells or protoplasts of a Phaffia strain with recombinant DNA under conditions 
conducive to uptake thereof. 
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said recombinant DNA comprising a transcription promoter and a downstream sequence to be 
expressed in operable linkage therewith, 

identifying Phaffia rhodozyma cells or protoplasts having obtained the said recombinant DNA 
in expressible form, 

s wherein the downstream sequence to be expressed comprises an isolated DNA sequence 

coding for an enzyme involved in the carotenoid biosynthetic pathway of Phaffia rhodozyma. Preferably, 
said enzyme has an activity selected from geranylgeranyl pyrophosphate synthase (cr/E), phytoene 
synthase (cr/B), phytoene desaturase (crri) and lycopene cyclase (cr/Y)^ more preferably an enzyme 
having an amino acid sequence selected from the one represented by SEQIDNO: 13. SEQIDNO: 15, 

10 SEQIDNO: 17 and SEQIDNO: 19. According to a further embodiment, the transcription promoter is 
heterologous to said isolated DNA sequence, such as a glycolytic pathway gene in Phaffia. Especially 
preferred according to this embodiment is the Glyceraldehyde*3-Phosphate Dehydrogenase (GAPDH) 
gene promoter. 

Also provided is a transformed Phaffia strain obtainable by a method according to the 
15 invention and capable of expressing, preferably over-expressing the DNA sequence encoding an enzyme 
involved in the carotenoid biosynthesis pathway gene. 

The invention is also embodied in recombinant DNA comprising an isolated DNA sequence 
according to the invention, preferably in the form of a vector. 

Also claimed is the use of such a vector to transform a host, such as a Phaffia strain. 
20 A host obtainable by transformation, optionally of an ancestor, using a method according to 

any one of claims 1 to 5, wherein said host is preferably capable of over-expressing DNA according to 
the invention. 

According to a further embodiment a method is provided for expressing an enzyme involved 
in the carotenoid biosynthesis pathway, by culturing a host according to the invention under conditions 
25 conducive to the production of said enzyme. Also provided is a method for producing a carotenoid by 
cultivating a host according to the invention under conditions conducive to the production of carotenoid. 
The following figures further illustrate the invention. 



Description of the Figures 

30 Fig. I. Mapping of the restriction sites around the Phaffia rhodozyma GAPDH gene. Ethidium 
bromide stained 0.8 % agarose gel (A) and Southern blot of chromosomal DNA (B) and 
cosmid pPRGDHcosl (C) digested with several restriction enzymes and hybridized with the 
300-bp PGR fragment of the Phaffia rhodozyma GAPDH gene. Lane 1, DNA x Kpn\\ 2, 
xPst\\ 3, xSffffll; 4, \Sph\\ L, lambda DNA digested with toEII; 5. x&/I; 6, xXbaX and 7, 

35 \Xho\. 

The blot was hybridized in 6 x SSC, 5 x Denhardt's, O.l % SDS, 100 ng/ml henring sperm 
DNA at 65''C and washed with 0.1 x SSC/0.1% SDS at 65°C. Exposure time of the 
autoradiogram was 16 h for the cosmid and 48 h from the blot containing the chromosomal 
DNA, 
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Fig. 2. The organisation of two subclones; pPRGDHB and derivative (A) and pPRGDH6 and 
derivatives (B) containing (a part of) the GAPDH gene of Phqffia rhodozyma. The PGR probe 
is indicated by a solid box. The direction and extent of the sequence determination is indicated 
by arrows. 

solid boxes: GAPDH coding sequence 

open box: 5' upstream and promoter region of GAPDH 

open box: 3* non-coding Phqffia rhodozyma GAPDH sequence 

solid line: GAPDH intron 

hatched box: Poly-linker containing sites for different restriction enzymes 
dotted line: deleted fragments 
Fig. 3. Cloning diagram of Phaffia transformation vector; pPR2. 

solid box: 5' upstream and promoter sequence of GAPDH 
hatched box: G41S 
solid line: pUC19 

open box: ribosomal DNA of Phqffia rhodozyma 

Only restriction sites used for cloning are indicated. 
Fig. 4. Construction of pPR2T from pPR2T. 

Solid box (BamWX - HindlW fragment): GAPDH transcription terminator from Phqffia. 

All other boxes and lines are as in Fig. 3. Only relevant details have been depicted. 
Fig. 5. Detailed physical map of pGB-Ph9. bps = basepairs; rDNA ribosomal DNA locus of Phqffia; 

act.pro 2 = act in transcription promoter; actl 5' non-translated and aminoterminal region of 

the open reading frame; NGN COD. = non-coding region downstream of G4 1 8-gene; 
Fig. 6. Detailed physical map of pPR2. GPDHpro = GAPDH transcription promoter region from 

Phqffia. Other acronyms as in Fig. 5. 
Fig. 7. Detailed physical map of pPR2T. Tgdh = GAPDH transcription terminator of Phaffta. All 

other acronyms as in Fig. 5 and 6. 
Fig. 8. Overview of the carotenoid biosynthetic pathway of Erwinia uredovora. 
Fig. 9. Representation of cDNA fragments and a restriction enzyme map of the plasmids pPRcrtE 

(A); pPRcrtB (B), pPRcrtl (C) and pPRcrtY (B). 

Detailed description of the invention 

The invention provides in generalised terms a method for obuining a transformed Phaffia 
strain, comprising the steps of 

contacting cells or protoplasts of a Phqffia strain with recombinant DNA under conditions 
conducive to uptake thereof, 

said recombinant DNA comprising a transcription promoter and a downstream sequence to be 
expressed which is heterologous to said transcription promoter, in operable linkage therewith. 

identifying Phaffia rhodozyma cells or protoplasts having obtained the said recombinant DNA 
in expressible form. 
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wherein the transcription promoter comprises a region that is found upstream of the open 
reading frame of a highly expressed Phaffia gene. 

In order to illustrate the various ways of practicing the tnventi some embodiments will be 
high-lighted and the meaning or scope of certain phrases will be elucidated. 

The meaning of the expression recombinant DNA is well known in the art of genetic 
modification, meaning that a DNA molecule is provided, single or double stranded, either linear or 
circular, nicked or otherwise, characterised by the joining of at least two fragments of different origin. 
Such joining is usually, but not necessarily done in vitro. Thus, within the ambit of the claim are 
molecules which comprise DNA from different organisms or different genes of the same organism, or 
even different regions of the same gene, provided the regions are not adjacent in nature. The 
recombinant DNA according to the invention is characterised by a transcription promoter found upstream 
of an open reading frame of a highly expressed Phaffia gene, fused to a heterologous DNA sequence. 
With heterologous is meant 'not naturally adjacent*. Thus the heterologous DNA sequence may be from 
a different organisms^ a different gene from the same organism, or even of the same gene as the 
promoter, provided that the downstream sequence has been modified, usually in vitro. Such modification 
may be an insertion, deletion or substitution, affecting the encoded protein and/or its entrance into the 
secretory pathway, and/or its post-translational processing, and/or its codon usage. 

The strong transcription promoter according to the invention must be in operable linkage with 
the heterologous downstream sequence in order to allow the transcriptional and translational machinery 
to recognise the starting signals. The regions upstream of open reading frames of highly expressed 
Phaffia genes contain TATA-like structures which are positioned at 26 to about 40 nucleotides upstream 
of the cap-site; the latter roughly corresponds with the transcriptional start site. Thus in order to allow 
transcription of the heterologous downstream sequence to start at the right location similar distances are 
to be respected. It is common knowledge, however, that there is a certain tolerance in the location of the 
TATA-signal relative to the transcription start site. Typically, mRNAs of the eukaryotic type contain a 
5 '-untranslated leader sequence (5'-utt), which is the region spanning the transcription start site to the 
start of translation; this region may vary from 30 to more than 200 nucleotides. Neither the length nor 
the origin of the 5*-utl is very critical; preferably it will be between 30 and 200 nucleotides. It may be 
from the same gene as the promoter, or it may be from the gene coding for the heterologous protein. It 
is well known that eukaryotic genes contain signals for the termination of transcription and/or 
poly adenylat ion, downstream of the open reading frame. The location of the termination signal is 
variable, but will typically be between 10 and 200 nucleotides downstream from the translational stop 
site (the end of the open reading frame), more usually between 30 and 100 nucleotides downstream from 
the translational stop site. Although the choice of the transcription terminator is not critical, it is found, 
that the when the terminator is selected from a region downstream of a Phaffia gene, preferably of a 
highly expressed Phaffia gene, more preferably from the GAPDH-encoding gene, the level of expression, 
as well as the frequency of transformation is improved. 

It was found that significant numbers of clones were obtained which could grow on very high 
G418 concentrations (up to, and over, 1 mg/ml). Transcription promoters according to the invention are 
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said to be from highly expressed genes, when they can serve to allow growth of transformed Phaffia 
cells, when linked to a G418 resistance gene as disclosed in the Examples, in the presence of at least 200 
^g/ml, preferably more than 400. even more preferably more than 600, still more preferably more than 
800 ng/ml of G4I8 in the growth medium. Examples of such promoters are, in addition to the promoter 
upstream from the GAPDH-gene in Phaffia, the promoters from Phaffia genes which are homologous to 
highly expressed genes from other yeasts, such as Pichia, Saccharomyces, Kluyveromyces, or fungi, such 
as Trichoderma, Aspergillus, and the like. Promoters which fulfill the requirements according to the 
invention, may be isolated from genomic DNA using molecular biological techniques which are, as such, 
all available to the person skilled in the art. The present invention provides a novel strategy for isolating 
strong promoters from Phaffia as follows. A cDNA-library is made from Phaffia mRNA, using known 
methods. Then for a number of clones with a cDNA insert, the DNA fragment (which represents the 
cDNA complement of the expressed mRNA) is sequenced. As a rule all fragments represent expressed 
genes from Phaffia. Moreover, genes that are abundantly expressed (such as the glycolytic promoters) 
are overrepresented in the mRNA population. Thus, the number of DNA-fragments to be sequenced in 
order to find a highly expressed gene, is limited to less than 100, probably even less than 50. The 
sequencing as such is routine, and should not take more than a couple of weeks. The nucleotide 
sequences obtained from this limited number of fragments, is subsequently compared to the known 
sequences stored in electronic databases such as EMBL or Geneseq. If a fragment shows homology of 
more than 50% over a given length (preferably more than 100 basepairs) the fragment is likely to 
represent the Phaffia equivalent of the gene found in the electronic database. In yeasts other than 
Phaffia, a number of highly expressed genes have been identified. These genes include the glycolytic 
pathway genes, phosphoglucoisomerase, phosphofructokinase. phosphotrioseisomerase, 
phosphoglucomutase, enolase, pyruvate kinase, alcohol dehydrogenase genes (EP 120 551, BP 0 164 
556; Rosenberg S. era/,, 1990, Meth. EnzymoL: 185, 341-351; Tuite M.F. 1982, EMBO J. 1, 603-608; 
Price V. et al.. 1990, Meth. Enzymol. 185, 308-318) and the galactose regulon (Johnston, S.A. et al.. 
1987, Cell 50, 143-146). Accordingly, those Phaffia cDNA fragments that are significantly homologous 
to the highly expressed yeast genes (more than 40%. preferably more than 50% identity in a best match 
comparison over a range of more than 50, preferably more than 100 nucleotides) should be used to 
screen a genomic library from Phaffia, to find the corresponding gene. Employing this method, 14 higly 
expressed mRNAs fttim Phaffia rhodozyma have been copied into DNA, sequenced, and their (putative) 
open reading frames compared to a nucleic acid and amino amino acid sequence databases. It turned out 
that 13 out of these fourteen cDNAs coded for ribosomal protein genes, of which one coded 
simultaneously to ubiquitin; one cDNA codes for a glucose-repressed gene. The isolation of the genes 
and the promoters usually found upstream of the coding regions of these genes is now underway, and It 
is anticipated that each of these transcription promoters may advantageously be used to express 
heterologous genes, such as carotenoid biosynthesis genes. Among the genes and transcription promoters 
especially preferred according to this invention are the promoter found upstream of the ubiquitin- 
ribosomal 40S protein corresponding to the cDNA represented in SEQ1DNO:IO, the glucose-repressed 
cDNA represented in SEQIDNO:26, the 40S ribosomal protein S27 encoding cDNA represented in 
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SEQIDNO:28, the 60S ribos mal protein PI a enc ding cDNA represented by SEQIDNO:30, the 60S 
ribosomal protein L37e encoding cDNA represented in SEQIDNO:32, the 60S ribosomal protein L27a 
encoding cDNA represented in SEQIDNO:34, the 60S ribosomal protein L25 encoding cDNA 
represented in SEQ1DN0:36, the 60S ribosomal protein P2 encoding cDNA represented in 
SEQ1DN0:38, the 40S ribosomal protein S17A/B encoding cDNA represented in SEQ!DNO:40. the 40S 
ribosomal protein S3 1 encoding cDNA represented in SEQ1DN0:42, the 40S ribosomal protein SIO 
encoding cDNA represented in SEOIDNO:44, the 60S ribosomal protein L37A encoding cDNA 
represented in SEQ1DN0:46« the 60S ribosomal protein L34 encoding cDKA represented in 
SEQIDNO:48, or the 40S ribosomal protein SI6 encoding cDNA represented in SEQlDNOiSO. 

Promoters from these or other highly expressed genes can be picked up by the method 
according to the invention using only routine skills of (a) making a cDNA library on mRNA isolated 
from a Phaffia strain grown under desired conditions, (b) determining (part oO the nucleotide sequence 
of the (partial) cDNAs obtained in step (a), (c) comparing the obtained sequence data in step (b) to 
known sequence data, such as that stored in electronic databases, (d) cloning putative promoter fragments 
of the gene located either directly upstream of the open reading frame or directly upstream of the 
transcription start site of the gene corresponding to the expressed cDNA, and (e) verifying whether 
promoter sequences have been obtained by expressing a suitable marker, such as the G4I8 resistance 
gene, or a suitable non-selectable "reporter" sequence downstream from a fragment obtained in (d), 
transforming the DNA into a Phaffia rhodozyma strain and determining the level of expression of the 
marker gene or reporter sequence of transformants. A transcriptional promoter is said to be of a highly 
expressed gene if it is capable of making Phaffia rhodozyma cells transformed with a DNA construct 
comprising the said promoter linked uptream of the G4I8 resistance marker resistant to G418 in 
concentrations exceeding 200 ng per liter culture medium, preferably at least 400, more prefercably 
more than 600 ^ig/l. Especially preferred promoters are those conferring resistance against more than 800 
fig/ml G41S in the growth medium. 

Optionally, the transcriptional start site may be determined of the gene corresponding to the 
cDNA corresponding to a highly expressed gene, prior to cloning the putative promoter sequences; this 
may serve to locate the transcriptional initiation site more precisely, and moreover, helps to determine 
the length of the 5 '-non-translated leader of the gene. To determine the location of the transcription start 
site, reverse primer extension, or classical Sl-mapping may be performed, based on the knowledge of the 
cDNA sequence. Thus the exact location of the transcription promoter can be determined without undue 
burden, and the isolation of a fragment upstream of the transcription start site and containing the 
promoter, from a hybridising genomic clone (for example a phage or cosmid) is routine. Cloning the 
putative promoter fragment in front (upstream) of the coding region of, for example the G418-resistance 
gene, and transfonning the gene cassette to Phaffia in order to evaluate the level of G418 resistance, and 
hence the level of expression of the G4l8-resistance gene as a consequence of the presence of the 
promoter is routine. 

In a manner essentially as described for the isolation of other strong promoters, above, a 
transcription terminator may be isolated, with the proviso, that the terminator is located downstream 
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from the open reading frame. The transcription stop site can be determined using procedures which are 
essentially the same as for the determination of the transcription start site. Ail these procedures are well 
known to those of skill in the art. A useful handbook is Nucleic Acid Hybridisation, Edited by B.D. 
Hames & S.J. Higgins, IRL Press Ltd., 1985; or Sambrook, sub. However^ it is not critical that the 
transcription terminator is isolated from a highly expressed Phqffia gene, as long as it is from an 
expressed gene. 

Using recombinant DNA according to the invention wherein the open reading ft^mc codes for 
reduced sensitivity against G4I8, a transformation frequency was obtained up to 160 transformants per 
|ig of linear DNA. at a G418 concentration in the medium of 40 ^g/ml. 

About 10 to 20 times as much transformed colonies were obtained with the vector according 
to the invention (pPR2) than with the prior art vector pGB-Ph9. disclosed in EP 0 590 707 A I (see 
Table 2; in the experiment of Example 7, the improvement is even more striking). 

The method according to the invention calls for conditions conducive to uptake of the 
recombinant DNA. Such conditions have been disclosed in EP 509 707. They include but are not limited 
to the preparation of protoplasts using standard procedures known to those of skill in the art, and 
subsequent incubation with the recombinant DNA. Altcmatively, Phqffia cells may be incubated 
overnight in the presence of LiAc and recombinant DNA. Still further alternative methods involve the 
use of particle acceleration. According to a preferred embodiment, the conditions conducive to uptake 
involve electroporation of recombinant DNA into Phqffia cells, such as described by Faber et al., (1994. 
Current Genetics 25, 305-310). Especially preferred conditions comprise electroporation, wherein the 
recombinant DNA comprises Phqffia ribosomal DNA. said recombinant DNA being in the linear form, 
most preferably by cleaving said recombinant DNA in the said ribosomal region. Still further preferred 
conditions, comprise the use of recombinant DNA in amounts of between 1 and 10 ^g per 10' cells, 
more preferably about 5\ig recombinant DNA is used per 2x10' celts, 
which are cultivated for 16 h at 2 PC. 

Once cells have been transformed according to the method, identification of transformed cells 
may take place using any suitable technique. Thus, identification may be done by hybridisation 
techniques, DNA amplification techniques such a polymerase chain reaction using primers based on the 
recombinant DNA used, and the tike. A preferred method of identifying transformed cells is one which 
employs selection for the recombinant DNA that comprises a gene coding for reduced sensitivity against 
a selective agent. A useful selective agent is 04 18, hygromycin, phleomycin and amdS, Genes that code 
for reduced sensitivity against these selective agents are well known in the art. The open reading frames 
of these genes may be used as the heterologous downstream sequence according to the invention, 
allowing selective enrichment of transformed cells, prior to identification of transformed cells. Once 
transformed cells have been identified they may used for further manipulation, or used directly in the 
production of valuable compounds, preferably in large scale fermcntors. 

It will be clear, that a very efficient method for transforming Phajfia strains has been 
disclosed. Moreover, not only the frequency of transformation is high, the expression levels of the 
transforming DNA is very high as well, as is illustrated by the exceptionally high resistance against 
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G418 of the transformed Phaffia cells when the open reading frame of the G4l8-r€sistance gene was 
fused to a promoter according to the invent! n when compared to the G418 resistance gene under control 
of the actin promoter in pGB-Ph9. It is concluded^ therefore, that the GAPDH-promoter is a high-level 
transcriptional promoter that can be suitably used in conjunction with any heterologous DNA sequence, 
in order to reach high expression levels thereof in Phaffta strains. 

It will be clear that the availability of new expression tools, in the form of the recombinant 
DNA according to the invention, creates a wealth of possibilities for producing new and valuable 
biomolecules in Phaffia. 

Preferably, the downstream sequence comprises an open reading frame coding for proteins of 
interest. For example genes already present in Phaffia, such as those involved in the caroienoid pathway, 
may be manipulated by cloning them under control of the high-level promoters according to the 
invention. Increased expression may change the accumulation of intermediates and/or end-products or 
change the pathway of B-carotene, cantaxanthin, astaxanthin and the like. The overexpression of the crtB 
gene from Erwinia uredovora will likely increase astaxanthin levels, as the product of this gene is 
involved in the rate limiting step. The expression of a protein of interest may also give rise to 
xamhophylls not known to be naturally produced in Phaffia, such as zeaxanthin. An open reading frame 
that may be suitably employed in such a method includes but is not limited to the one encoding the 
protein producing zeaxanthin (crtZ gene) obtained from Erwinia uredovora (Misawa et al.l990. 
J.BaclerioI. 172 : 6704-6712). Other carotenoid synthesis genes can be obtained for example from 
Flavobacterium (a gram-positive bacterium), Synechococcus (a cyanobacterium) or Chlamydomonas or 
Dunaliella (algae). Obviously, carotenoid synthesis genes of a Phaffia strain, once the genes have been 
isolated and cloned, are suitably cloned into a recombinant DNA according to the invention and used to 
modify the carotenoid content of Phaffia strains. Examples of cloned carotenoid genes that can suitably 
be overexpressed in Phaffia, are those mentioned in Fig. 8. Particularly useful is cr/E from Phycomyces 
blakesleanus, encoding Geranylgeranyl Diphosphate Synthase, and cr/B, encoding phytoene synthase, as 
this step appears to be the rate-limiting step in carotenoid synthesis in Thermus thermophylus (Hoshino 
T. et ai, 1994, Journal of Fermentation and Bioengineering 22, No. 4, 423-424). Especially preferred 
sources to isolate carotenoid biosynthctic genes or cDNAs from are the fungi Neurospora crassa, 
Blakeslea trispora. Other yeasts shown to possess cross-hybrising species of carotenoid biosynthctic 
genes are Cystofylobasidium, e.g. bisporidii and capitatum. 

Carotenoid biosynthesis genes have also been identified in plants; these plant cDNAs or genes 
from plants may be used as well. Optionally, the codon usage of the Phaffia genes or cDNAs may be 
adapted to the preferred use in the host organism. 

Of special interest according to the present invention, are the DNA sequences coding for four 
different enzymes in the carotenoid biosynthesis pathway of Phaffia rhodozyma, represented in the 
sequence listing. It will be clear to those having ordinary skill in the art, that once these DNA sequences 
have been made available it will be possible to bring about slight modifications to the DNA sequence 
without modifying the amino acid sequence. Such modifications are possible due to the degeneracy of 
the genetic code. Such modifications are encompassed in the present invention. However, also 
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modifications in the coding sequences are envisaged that create modifications in the amino acid sequence 
of the enzyme. It is well known to those of skill in the ait that minor modifications arc perfectly 
permissible in terms of enzymatic acitivty. M st changes, such as delections, additi ns or amino acid 
substitutions do not affect enzymatic acitivity. at least not dramatically. Such variants as comprise one or 
more amino acid deletions, additions or substitutions can readily be tested using the complementation 
lest disclosed in the specification. The skilled person is also familiar with the term "conservative amino 
acid substitutions", meaning substitutions of amino acids by similar amino acids residing in the same 
group. The skilled person is also familiar with the temi "allelic variant", meaning naturally occurring 
variants of one particular enzyme. These conservative substitutions and allelic enzyme variants do not 
depart from the invention. 

As stated, at the DNA level considerable variation is acceptable. Although the invention 
discloses four DNA sequences, as represented in SEQIDNO: 12. SEQIDNO: 14, SEQIDNO: 16, 
SEQIDNO: 18, SEQIDNO:20, or SEQIDNO: 22, in detail also isocoding variants of the DNA sequence 
represented in SEQIDNO: 12, SEQIDNO: 14, SEQIDNO: 16, SEQIDNO: 18, SEQIDNO: 20, or 
SEQIDNO: 22, are encompassed by the present invention. Those of skill in the art would have no 
difficulty in adapting the nucleic acid sequence in order to optimize codon usage in a host other than P. 
rhodozyma. Those of skill in the art would know how to isolate allelic variants of a DNA sequence as 
represented in SEQIDNO: 12. SEQIDNO: 14. SEQIDNO: 16, SEQIDNO: 18. SEQIDNO: 20, or 
SEQIDNO: 22 from related Phaffia strains. Such allelic variants clearly do not deviate from the present 
invention. 

Furthermore, using the DNA sequences disclosed in the sequence listing, notably SEQIDNO: 
12. SEQIDNO: 14, SEQIDNO: 16 or SEQIDNO: 18, as a probe, it will be possible to isolate 
corresponding genes form other strains, or other microbial species, or even more remote eukaryotic 
species if desired, provided that there is enough sequence homology, to detect the same using 
hybridisation or amplification techniques known in the an. 

Typically, procedures to obtain simitar DNA fragments involve the screening of bacteria or 
bacteriophage plaques transformed with recombinant plasmids containing DNA fragments from an 
organism known or expected to produce enzymes according to the invention. After in situ replication of 
the DNA, the DNA is released from the cells or plaques, and immobilised onto filters (generally nitro- 
cellulose). The filters may then be screened for complementary DNA fragments using a labeled nucleic 
acid probe based on any of the sequences represented in the sequence listing. Dependent on whether or 
not the organism to be screened for is distantly or closely related, the hybridisation and washing 
conditions should be adapted in order to pick up true positives and reduce the amount of false positives. 
A typical procedure for the hybridisation of filter-immobilised DNA is described in Chapter 5, Table 3, 
pp. 120 and 121 in: Nucleic acid hybridisation- a practical approach, B.D. Hames & S.J. Higgins Eds., 
1985, IRL Press, Oxford). Although the optimal conditions are usually determined empirically, a few 
useful rules of thumb can be given for closely and less closely related sequences. 

In order to identify DNA fragments very closely related to the probe, the hybridisation is 
performed as described in Table 3 of Hames & Higgins, supra, (the essentials of which are reproduced 
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below) with a final washing step at high stringency in O.I * SET buffer (20 times SET = 3M NaCI, 20 
mM EDTA, 0.4 M Tris-HCl, pH 7.8), 0.1% SDS at 68* Celsius), 

To identify sequences with limited homology to the probe the procedure to be followed is as 
in Table 3 of Hames & Higgins, supra, but with reduced temperature of hybridisation and washing. A 
final wash at 2 * SET buffer, 50**C for example should allow the identification of sequences having 
about 75% homology. As is well known to the person having ordinary skill in the art, the exact 
relationship between homology and hybridisation conditions depend on the length of the probe, the base 
composition (% of G + C) and the distribution of the mismatches; a random distribution has a stronger 
decreasing effect on T„ then a non-random or clustered pattern of mismatches. 

The essentials of the procedure described in Table 3, Chapter 5 of Hames & Higgins are as 

follows: 

(1) prehybridisation of the filters in the absence of probe, (2) hybridisation at a temperature between 50 
and 68°C in between 0.1 and 4 * SET buffer (depending on the stringency), 10 ♦ Denhardt's solution 
(100 ♦ Denhardt*s solution contains 2% bovine serum albumin. 2% Ficoll, 2% polyvinylpyrrolidone), 
0.1% SDS, 0.1% sodiumpyrophosphatc, 50 ng/ml salmon sperm DNA (from a stock obtainable by 
dissolving I mg/ml of salmon sperm DNA, sonicated to a length of 200 to 500 bp, allowed to stand in a 
water bath for 20 min., and diluted with water to a final concentration of 1 mg/ml); hybridisation time is 
not too critical and may be anywhere between I and 24 hours, preferably about 16 hours (o/n); the probe 
is typically labeled by nick-translation using as radioactive label to a specific activity of between 5 * 
10' and 5 * 10' c.p.m./^g; (3) (repeated) washing of the filter with 3 ♦ SET, 0.1% SDS, 0.1% 
sodiumpyrophosphatc at 68*'C at a temperature between 50**C and 68'*C (dependent on the stringency 
desired), repeated washing while lowering the SET concentration to 0.1%., wash once for 20 min, in 4 • 
SET at room temperature, drying filters on 3MM paper, exposure of filters to X-ray film in a cassette at 
'10°C for between 1 hour and 96 hours, and developing the film. 

Generally, volumina of prehybridisation and hybridisation mixes should be kept at a 
minimum. Alt "wet" steps may be carried out in little sealed bags in a pre-heated water bath. 

The above procedure serves to define the DNA fragments said to hybridise according to the 
invention. Obviously, numerous modifications may be made to the procedure to identify and isolate 
DNA fragments according to the invention. It is to be understood, that the DNA fragments so obtained 
fall under the terms of the claims whenever they can be detected following the above procedure, 
irrespective of whether they have actually been identified and/or isolated using this procedure. 

Numerous protocols, which can suitably be used to identify and isolate DNA fragments 
according to the invention, have been described in the literature and in handbooks, including the quoted 
Hames & Higgins, supra). 

With the advent of new DNA amplification techniques, such as direct or inverted PCR, it is 
also possible to clone DNA fragments in vitro once sequences of the coding region are known. 

Also encompassed by the claims is a DNA sequence capable, when bound to nitrocellulose 
filter and after incubation under hybridising conditions and subsequent washing, of specifically 
hybridising to a radio-labelled DNA fragment having the sequence represented in SEQIDNO: 12, 
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SEQIDNO: 14, SEQIDNO; 16 or SEQIDNO: 18, as detectable by autoradiography of the filter after 
incubation and washing, wherein said incubation under hybridising conditions and subsequent washing is 
performed by incubating the filter-bound DNA at a temperature of at least 50"C. preferably at least 
SS^C, more preferably at least 60*C in the presence of a solution of the said radio-labeled DNA in 0.3 
M NaCI. 40 mM Tris-HCI, 2 mM EDTA, 0.1% SDS. pH 7.8 for at least one hour, whereafter the filter 
is washed at least twice for about 20 minutes in 0.3 M NaCI, 40 mM Tris-HCI, 2 mM EDTA, 0.1% 
SDS, pH 7.8, at a temperature of 50**C, preferably at least 55"C, more preferably at least 60*C, prior to 
autoradiography. 

The heterologous DNA sequence according to the invention may comprise any open reading 
frame coding for valuable proteins or their precursors, like pharmaceutical proteins such as human senim 
albumin, IL-3. insulin, factor VIII. tPA, EPO, a-interferon, and the like, detergent enzymes, such as 
proteases and lipases and the like, cell wall degrading enzymes, such as xylanases, pectinases, cellulases, 
glucanases, polygalacturonases, and the like, and other enzymes which may be useful as additives for 
food or feed (e.g. chymosin, phytases, phospholipases, and the like). Such genes may be expressed for 
the purpose of recovering the protein in question prior to subsequent use, but sometimes this may not be 
necessary as the protein may be added to a product or process in an unpurifled fonm. for example as a 
culture nitrate or encapsulated inside the Phaffta cells. 

The yeast cells containing the carotenoids can be used as such or in dried form as additives to 
animal feed. Furthermore, the yeasts can be mixed with other compounds such as proteins, carbohydrates 
or oils. 

Valuable substances, such as proteins or pigments produced by virtue of the recombinant DNA 
of the invention may be extracted. Carotenoids can also be isolated for example as described by Johnson 
ei al. (Appl. Environm. Microbiol. 35: 1155-1159 (1978)). 

Purified carotenoids can be used as colorants in food and/or feed. It is also possible to apply 
the carotenoids in cosmetics or in pharmaceutical compositions. 

The heterologous downstream sequence may also comprise an open reading frame coding for 
reduced sensitivity against a selective agent. The open reading frame coding for an enzyme giving G418 
resistance was used satisfactorily in the method according to the invention, but the invention is not 
limited to this selection marker. Other useftil selection markers, such as the phleomycin resistance gene 
may be used, as disclosed in EP 590 707. Each of these genes is advantageously expressed under the 
control of a strong promoter according to the invention, such as the GAPDH-promoter, 

The invention is now being illustrated in greater detail by the following non-limitative 

examples. 

Experimental 

Strains: £. coli DH5a: 5«pE44/acUl69 (80/flcZM15) /w</Rl7 recM €ndA\ gyrA96 thi-1 re/Al 
£. cod LE392: jupE44 5tfpF58 hsclR5\4 galK2 gaJTll me/Bl trpKSS lacY\ 
P. rhodozyma CBS6938 

Plasmids: 

pUC19 (Gibco BRL) 
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pTZI9R 
PUC-G418 

pGB-Ph9 (Gist-brocades) 
pMT6 (1987, Breter H,-J., Gene 53, 181-190)) 
Media: LB: 10 g/l bacto tryptone, 5 g/1 yeast extracts 10 g/1 NaCl. Plates; +20 g/1 bacto agar. When 
appropriate 50 ^g/ml ampicillin. 

YePD: 10 g/1 yeast extract, 20 g/1 bacto peptone, 20 g/1 glucose. Plates; +20 g/1 bacto agar. 
When appropriate 50 ng/ml Geneticin (G418). 

Methods : All molecular cloning techniques were essentially carried out as described by Sambrook et al. 
in Molecular Cloning: a Laboratory Manual, 2nd Edition (1989; Cold Spring Harbor Laboratory Press). 

Enzyme incubations were performed following instructions described by the manufacturer. 
These incubations include restriction enzyme digestion, dephosphorylation and ligation (Gibco BRL). 

Isolation of chromosomal DNA from Phaffia rhodozyma as described in example 3 of patent 
Gist-brocades; EP 0 590 707 A I. Chromosomal DNA from K. lactis and Sxerevisiae was isolated as 
described by Cryer et al.(Methods in Cell Biology J[2: 39, Prescott D.M. (ed.) Academic Press, New 
York). 

Isolation of large (> 0.5-kb) DNA fragments from agarose was performed using the Geneclean 
II Kit whereas small (< 0.5-kb) and DNA fragments or fragments from PCR mixtures were isolated 
using Wizard™ DNA CIcan-Up System (Promega). 

Transformation of E. coli was performed according to the CaCU method described by 
Sambrook et al. Packaging of cosmid ligations and transfection to £. coU LE392 was carried out using 
the Packagene Lambda DNA Packaging System (Promega), following the Promega protocols. 

Isolation of plasm id DNA from E. coli was performed using the 0 1 AG EN (Westburg B.V. 

NL). 

Transformation of Phcffia CBS6938 was done according to the method for H. polymorpha 
described by Faber et al.y supra; 

- Inoculate 30 ml of YePD with 1 CBS6938 colony 

- Grow 1-2 days at 2rC. 300 rpm (pre-culture) 

- Inoculate 200 ml of YePD with pre-culture to OD^oo = between 0 and I (if above 1 dilute with water) 

- Grown o/n at 21°C, 300 rpm until ODwo - 1-2 (dilute before measuring) 

- Centrifuge at 5 min. 8000 rpm, room temperature. Remove supernatant thoroughly 

- Resuspend pellet in 25 ml 50 mM KPi pH 7.0, 25 mM DTT (freshly made) 

Transfer suspension to a fresh sterile 30 ml centrifuge tube and incubate for 15 min. at room temperature 

- Centrifuge 5 min. at 8000 rpm 4*C, remove supernatant thoroughly 

- Resuspend pellet in 25 ml of ice cold STM (270 mM sucrose, 10 mM Tris pH 7.5, 1 mM MgCI^) 

- Centrifuge 5 min. at 8000 rpm, 4X 

- Repeat washing step 

- Resuspend cells in 0.5 ml of ice cold STM (3*10' cells/ml). Keep on ice! 
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- Transfer 60 ^l of cell suspension to pre-cooled Eppendorf lubes containing 5 ng transforming DNA 
(use precooled tips!), Keep on ice 

-Transfer Cell/DNA mix to precooled electroporation cuvettes (top to bottom) 
-Pulse: 1.5 kV, 400 Q, 25 \xT 

- Immediately add 0.5 ml of ice cold YePD. Transfer back to ep using a sterile Pasteur pipette 

- Incubate 2.5 hrs al21°C 

- Plate 100 ^1 onto YePD-plates containing 40 ng/ml G4I8 

- Incubate at 2 IX until colonies appear. 

Pulsed Field Electrophoresis was performed using a GENE Navigator + accessories 
(Phannacia). Conditions: 0.15 ♦ TBE, 450 V, pulse lime 0.5 s, 1.2% agarose, run time 2 h. 

Polymerase Chain Reaction (PGR) experiments were performed in mixtures having the 
following composition: 

- 5 ng of plasmid DNA or 1 ^g chromosomal DNA 

• 0-5 Mg of oHgo nucleotides (5 ng degenerated oligo's in combination with chromosomal 
DNA) 

- 10 nm of eachdNTP 

- 2.5 KCI 

- 0.5 pim Tris pH 8.0 
. 0.1 Mm MgCI2 

- 0.5 fig gelatin 

' Taq polymerase (5 U in combination with chromosomal DNA) 
HjO was added to a total volume of 50 /il 

Reactions were carried out in an automated thermal cycler (Pcrkin-Elmer). 
Conditions: 5 min. 95**C , followed by 25 repeated eye li; 2* 94**C, 2' 45''C3' 72**C 
Ending ; 10 min. 72'*C. 

Fusion PGR reactions were performed as described above, except that 2 DNA fragments with 
compatible ends were added as a template in equtmolar amounts. 
Oligo nucleotide sequences were as follows: 

3005: GGGGATGCAA(A/G)CTNAGNGGNATGGG (SEQIDNO: 1); 

3006: GGGGATGC(A/G)TAICG(G/A/GKC/T)A(T/G)TC(A/G)TT(A/G)TC(A/G)TAGGA (SEQIDNO: 2); 

4206: GCGTGAGTTCTGGGCAGCCAGGATAGG (SEQIDNO: 3); 

5126: TTGAATCCACATGATGGTAAGAGTGTTAGAGA (SEQIDNO: 4); 

5127: CTTACCATGATGTGGATTGAAGAAGATGGAT (SEQIDNO: 5); 
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5177: CCCAAGCTTCTCGAGGTACCTGGTGGGTGCATGTATGTAC (SEQIDNO: 6); 

5137: CCAAGGCCTAAAAC GGATCC CTCCAAACCC (SEQIDNO: 7); 

5138: GCCAAGCTTCTCGAGCTTGATCAGATAAAGATAGAGAT (SEQIDNO: 8); 
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Example 1 

G-418 resistance of Phaffta transformant G418-1 
To determine the expression of the G418 resistance gene in pGB-Ph9, transformant G418-i 
(EP 0 590 707 A 1) was exposed to increasing concentrations of G418. 

Two dilutions of a G418-1 culture were plated onto YepD agar containing 0-1000 |ig/ml G418 
(Table I). 



(G418] jig/ml 


FhaffiaGAnA 
Dil=10-*{OD««=7) 


Phaffia G418-1 
Dil=10-*(OD«K,=7) 


Phaffia (CBS6938) 
Dil=0(OD««=5) 


0 


>300 


74 


>300 


200 


>300 


70 


0 


300 


>300 


61 


0 


400 


212 


13 


0 


500 


10 


2 


0 


600 


0 


0 


0 


700 


0 


0 


0 


800 


0 


0 


0 


900 


0 


0 


0 


1000 


0 


0 


0 



Table I. Survival of Phaffia transformant G418-1 on YepD agar medium containing increasing 
concentrations of G4 1 8. 



At a concentration of 600 |ig/ml G418 less than 1% of the plated cells survived. It can be 
concluded, that despite multicopy integration of pGB-Ph9, G418-1 shows a rather weak resistance to 
G418 (Scorer ei ai, 1994, Bio/Technology 12, p. 181 et seq., Jimenez and Davies, 1980, Nature 182 P- 
869 et seq.), m st probably due to a weak action of the Phaffia actin promoter in the plasmid. The 
results that the Phaffta actin promoter works poorly, prompted us to isolate promoter sequences of 
Phaffta with strong promoter activity. 
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Example 2 

Synthesis of specific pro bes of glycolytic genes from Phaffta rhodozvma bv PGR 

The polymerase chain reaction (PCR) technique was used in an attempt to synthesize a 
homologous probe of the genes encoding glyceraldehyde-3-phosphate dehydrogenase (GAPDH), 
phosphoglyccrate kinase (PGK) and the triose phosphate isomerase (TPI) of Fhaffia rhodozyma, 

A set of degenerated oligonucleotides was designed based on the conserved regions in the 
GAPDH-gene (Michels et aL, 1986. EMBO J. 5: 1049-1056). PGK-gene (Osinga et ai, 1985. EMBO J. 
4: 381 1-3817) and the TPI-gene (Swinkels et a(„ 1986. EMBO J. 5: 1291-1298). 

AH possible oligo combinations were used to synthesize a PCR-fragment with chromosomal 
DNA of Phaffia rhodozyma (strain CBS6938) as template. Chromosomal DNA of Saccharomyces 
cerevisiae and Kluyveromyces iactis as template was used to monitor the specificity of the amplification. 
The PCR was performed as described above, the PCR conditions were V 95 *C, 2' annealing 
temperature (T,). in 5* from annealing temperature to 72 "C, T 12 ''C, for 5 cycli followed by 1' 95 °C, 
2' 55 °C and 2' 72 *C for 25 cycli and another elongation step for 10* 72 °C. Three different T, were 
used 40 °C, 45 ^'C and 50 **C. 

Under these conditions, only one primer combination produced a fragment of the expected size 
on chromosomal DNA of Phaffia as template. Using the oligo combination no: 3005 and 3006 and a T, 
of 45 'C a 0.3-kb fragment was found. Specifically, the GAPDH oligonucleotides correspond with amino 
acids 241-246 and 331-338 of the published S. cerevwwe sequence. (It was concluded that to isolate the 
promoters corresponding to the PGK- and TPI-genes from Phaffia, either further optimization of the 
PCR-conditions is required, or homologous primers should be used. Another alternative method for 
isolating high level promoters is disclosed in the detailed description, supra. 

The amplified fragment was purified from the PCR reaction and was digested with BamW\ and 
ligated into the dephosphoryiated BamWl site of pTZ19R. The ligation mixture was transformed to 
competent E, coli DH5a cells prepared by the CaClj-meihod and the cell were plated on LB-plates with 
50 ^lg/mI Amp and 0.1 mM IPTG/50 ^g/ml X-gal. Plasmid DNA was isolated from the white colonies. 
The pTZ19R clone with the right insert, called pPRGDHI, was subsequently used for sequence analysis 
of the insert. 

The cloned sequence encoded for the carboxy terminal fragment of GAPDH of Phaffia as shown by 
comparison with the GAPDH-gene sequence of S. cerevisiae (Holland and Holland, 1979. J. of Biol. 
Chem. 254: 9839-9845). 

Example 3 
Isolation of the GAPDH-gene of Phaffta 
To obtain the complete GAPDH-gene including expression signals the 0.3-kb BamHI fragment 
of pPRGDHI was used to screen a cosmid library of Phaffia. 

Preparation of the vector for cosmid cloning . 
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Vector preparation was simplified, because of the presence of a double cos-site in pMT6. PMT6 
was digested to completion with blunt end cutter PvuU to release the cos-sites. Digestion efTiciency was 
checked by transformation to £. coii DH5a and found to be >99%. 

The Pvuli digested pMT6 was purified by phenolxhloroform extraction and ethanol 
precipitation and fmally solved in 30 /xl T£ at a concentration of 2 tig/fiL 

The vector was subsequently digested with cloning enzyme BamHl and the vector arms were purified as 
described above ("Experimental"). 

Preparation of target DNA 

Isolation of genomic DNA of Phaffia strain CBS6938 was performed as described in the pan 
named "Experimental". The cosmid pMT6 containing inserts of 25-38-kb are most efficiently packaged. 
Therefore genomic DNA was subjected to partial digestion with the restriction enzyme Sau3A. Target 
DNA was incubated with different amounts of enzyme. Immediately after digestion the reactions were 
stopped by the extraction of DNA from the restriction mixture with phenol-chloroform. The DNA was 
precipitated by using the ethanol method and the pelleted DNA after centrifiigation was dissolved in a 
small volume of TE. Contour clamped homogeneous electric field (CHEF) electrophoresis was used to 
estimate the concentration and size of the fragments (Dawkins, 1989, J. of Chromatography 492, pp. 
615-639). 

Construction of genomic cosmid library. 

Ligation of approximately 0.5 fig of vector arm DNA and 0.5 fig of target DNA was performed 
in a total volume of 1 0 ^1 in the presence of 5 mM ATP (to prevent blunt end ligation). 
Packaging in phage heads and transfection to E. coli LE 392 as described in Experimental. 
The primary library consisted of 7582 transfectants with an average insert of 28-kb as detennined by 
restriction analysis. The library represents 3.5 times the genome with a probability of the presence of all 
genes in the library of 0.97 as calculated according to Sambrook {supra). For library amplification the 
transfectants were pooled by resuspending in 8 ml LB-broth. Additional 4.8 ml glycerol was added. The 
transfectants mixture was divided into 16 samples of 800 ^l each and stored at -80 **C. This amplified 
library consisted of 2.9*10' transfectants. 

Screening of the cosmid library. 

A 100 )il sample was taken from this library and further diluted (10^) in LB-broth and 200 ^1 
was plated onto 10 LB-plates containing ampicillin. The plates were incubated overnight at 37 ^'C. Each 
plate contained 300-400 colonies and filters were prepared. These filters were screened with the 
GAPDH-probe using hybridization and washing conditions as described above ("Experimental"). After 
16 hours exposure, 3 strong hybridization signals were found on the autoradiogram. 
Cosmid DNA isolated from these positive colonies was called pPRGDHcosl, pPRGDHcos2 and 
pPRGDHcos3. 
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Chromos mal DNA isolated from Phqffla rhodozyma strain CBS 6938 and cosmid 
pPRGDHcosI was digested with several restriction enzymes. The DNA fragments were separated, 
blotted and hybridized as described before. The autoradiograph was exposed for different time periods at 
-80*'C. The film showed DNA fragments of different length digested by different restriction enzymes 
which hybridize with the GAPDH-probe (Fig. 1). 

Furthermore, from Southern analysis of the genomic DNA of Phqffla using the GAPDH 
fragment as probe, it was concluded that the GAPDH-encoding gene is present as a single copy gene in 
Phqffia rhodozyma, whereas in Saccaromyces cerevisiae GAPDH is encoded by three closely related but 
unlinked genes (Boucherie et ai, 1995. FEMS Microb. Letters 135:127-134). 

Hybridizing fragments of pPRGDHcosl for which a fragment of the same length in the 
chromosomal DNA digested with the same enzyme was found, were isolated from an agarose gel. The 
fragments were ligated into the corresponding sites in pUCI9. The ligation mixtures were transformed to 
competent E. coli cells. The plasmids with a 3.3-kb Sail insert and a 5.5-kb £coRI insert were called 
pPRGDH3 and pPRGDH6, respectively. The restriction map of pPRGDH3 and pPRGDH6 Is shown in 
Figure 2. Analysis of the sequence data of the insert in pPRGDHl showed us that there was a HindlU 
site at the C-terminal part of the GAPDH-gcne. From this data it was suggested that the insert in 
pPRGDH6 should contain the complete coding sequence of GAPDH including promoter and terminator 
sequences. 

Example 4 
Characterization of the GAPDH-gcne 

In order to carry out sequence analysis without the need to synthesize a number of specific 
sequence primers a number of deletion constructs of plasmids pPRGDHS and pPRGDH6 were made 
using convenient restriction sites in or near the putative coding region of GAPDH gene. 

The plasmids were digested and after incubation a sample of the restriction mixture was 
analyzed by gel electrophoresis to monitor complete digestion. After extraction with phenol-chlorofomi 
the DNA was precipitated by ethanoL After incubation at -20 for 30' the DNA is pelleted by 
centrifugation, dried and dissolved in a large volume (0.1 ng/^1) of TE. After ligation the mixtures were 
transformed to E. coli. Plasmid DNA isolated from these transformants was analyzed by restriction 
analysis to reveal the right constructs. In this way the deletion constructs pPRGDH35HIII, 
pPRGDH66BamHI, pPRGDH6655/I and pPRGDH665a/l (Fig. I). 

In addition to this, the 0.6-kb and 0.8-kb Ssi\ fragments derived from pPRGDH6 were 
subcloned in the corresponding site of pUCI9. 

Sequence analysis was carried out using pUC/M13 forward and reverse primers (Promega). The 
sequencing stategy is shown in ftg. 2 (see airows). 

On the basis of homology with the GAPDH-gene sequence of S. cerevisiae (Holland and 
Holland, 1979. J. of Biol. Chem. 254: 9839-9845) and K. lactis (Shuster, 1990. Nucl. Acids Res. H, 
4271) and the known splice site concensus J.L. Wooiford. 1989. Yeast 5: 439-457), the introns and the 
possible ATG start were postulated. 
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The GAPDH gene has 6 introns (Fig. 1) and encodes a polypeptide of 339 amino acids. This 
was completely unexpected considering the genomic organisation of the GAPDH genes of K, lactis and 
5". cerevtsiae which have no introns and both consist of 332 amino acids. The homology on the amino 
acid level between the GAPDH gene of Phaffia and K. lactis and S. cerevtsiae is 63% and 61%, 
respectively. 

Most of the introns in the GAPDH gene are situated at the 5' part of the gene. Except intron III all 
introns contain a conserved branch-site sequence 5'-CTPuAPy-3' found for S. cerevtsiae 2a\d S. pombe. 

By computer analysis of the upstream sequence using PC-gene 2 putative eukaryotic promoter 
elements, TATA-box (position 249-263 in SEQIDNO: \ 1) and a number of putative Cap signal (between 
position 287 and 302 in SEQIDNO: 1 1) were identified. 

Example 5 

Cloning of the GAPDH promoter fused to 04 18 in dUCG418. 
In order to construct a transcription fusion between the GAPDH promoter and the gene 
encoding G418 resistence the fusion PGR technique was used. 

Using plasmid pPRGDH6 the GAPDH promoter could be amplified by standard PCR protocols 
("Experimental"). 

In the PCR mix pPRGDH6 and oligo*s No. 5177 and 5126 (Sequences in "Experimental") were 
used. A 416 bp DNA fragment was generated containing the entire GAPDH promoter sequence. In 
addition this fragment also contains a HindUl, Xho\ and a Kpn\ restriction site at it's 5'end and 12 nt 
overlap with the 5' end of the gene encoding G418 resistance. 

The 217 bp portion of the 5'end of the G418 coding sequence was also amplified by PCR using 
pUC-G418 and oligo's 4206 and 5127. A 226 bp DNA fragment was obtained containing the 217 bp 
5*end of G418 and having a 9 nucleotides overlap with the 3 'end of the earlier generated GAPDH 
promoter fragment. It also contained a Mscl site at it's 3end. 

The PCR fragments were purified fi-om the PCR mixture using the WIZARD Kit. 
Approximately I ng of the GAPDH promoter fragment and 1 /tg of the G418 PCR fragment were used 
together with oligo^s 5177 and 4206 in a fusion PCR experiment (Experimental). A 621 bp DNA 
fragment was generated, containing the GAPDH promoter directly fused to the 5' portion of G418. After 
purification the DNA fragment was digested with h4sc\ and Kpn\. The 3.4 Kb Msc\'Kpn\ fragment of 
pUC-G4I8, containing pUC sequences and the 3' portion of G418, was used as a vector. 
The ligation mixture was transformed to competent £. coli DH5a cells. Transformant colonies 
containing the fusion PCR DNA inserted were identified by digestion with different restriction enzymes. 

Thus, plasmid pPRl was obtained, containing the GAPDH promoter directly fused to the G418 
marker gene. Three pPRl vectors isolated from independent transformants were used in further cloning 
experiments. 

To target the plasmid, after transformation, to a specific integration site a 3.0-kb Sstl fragment 
containing a part of the ribosomal DNA of Phaffia was cloned in pPRl. The ribosomal DNA fragment 
was isolated from an agarose gel after digestion with Sst\ of plasmid pGB-Phll (EP 590 707 Al). This 
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fragment was ligaied in the dephosphorylated SstI site of pPRl. The ligation mixture was transformed to 
competent E. coli cells. Plasmid DNA was isolated and using restriction analysis it was shown that 
several colonies contain the expected plasmid pPR2. The complete cloning strategy is shown in Fig. 3. 

Example 6 
Transformation of Phaffta with dPR2. 
Transformation of Phaffia strain 6938 was performed using an electroporation procedure as 
previously described by Faber ct al. (1994, Curr. Genet. 1994: 2^,305-310) with the following 
modifications: 

- Electropulsing was performed using the Bio-rad Gene Pulser with Pulse Controller and with Bio-rad 
2mm cuvettes. 

- Phaffia was cultivated for 16 h at 21 "C. 

- Per transformation 2x10' cells were used together with 5 ^g of linearized vector. Linearization was 
done in the rDNA sequence using Cla\ to enable integration at the rDNA locus in the Phaffia genome. 
Following the electric pulse (7.5kV/cm, 400 O. and 25 ^iF) 0.5 ml YePD medium was added to the 
cell/DNA mixture. The mixture was incubated for 2.5 h at 21 *C and subsequently spread on 5 selective 
YEDP agar plates containing 40 ^g/ml G4I8. 

As shown in Table 2 we were able to generate transformants with 1 1 5 transformants per ^ig 
DNA; the average transformation frequency was 50 transformants/ng pPR2 as judged over a number of 
experiments. Transformation of the closed circular form of pPR2 did not result in transformation 
suggesting that there is no autonomously replicating sequence present within the vector sequences. Using 
pPR2 a 10 to 50-fold increase in transformation frequency was found compared to a previous 
constructed transformation vector for Phaffia, called pGB-Ph9 . In this latter vector a translation fusion 
was made between the 5' part of the actin gene of Phaffia and G418. 

In order to analyze the level of resistance of transformants the mixture or DNA/cells was plated 
onto selective plates containing different amounts of G418. Although the total number of transformants 
decreases with the increasing amounts of G418. we were still able to obtain a considerable number of 
transformants (table 3). 

in another experiment 30 transfomiants obtained under standard selection conditions (40 ^g/ml) 
were transfered to plates containing 50, 200 or 1000 ng/ml. After incubation of the plates at 21 **C for 
4-5 days, 23 transformants out of 30 tested were able to grow on plates containing 200 ng/ml G418. 
One transformant was able to grow on plates containing upto and above 1000 ^g/ml G4I8. 

Table 2. Transformation frequency of pGB-Ph9 and dPR2 . 



Exp. I Exp.2 

69 8 

pGB-Ph9x5g/II 46 7 

pPR2 ccc n.d n.d 

pPR2(A)xC/ciI 714 56 

(B) 639 124 
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Total number of transformants (> ! mm) in different transformation experiments after 4-5 days 
5 incubation. 

Table 3. Comparison of G4i8 sensitivity as a result of two different G4 1 8 -resistance genes in 

DGB-Ph9 and pPR2 

10 concentration Number of 

0418 (^g/ml) transformants 

pPR2xC/fll pGB-Ph9x5g/I! (=pYac4) 

40 480 2 

50 346 

60 155 

70 61 

80 141 

90 72 

100 64 

Analysis of pPR2 transformants. 

To analyse the integration event and the number of integrated vector copies total genomic DNA 
25 from six independent transformants was isolated. Therefore these transformants were cultivated under 
selective conditions, i.e. YePD + 50 ng/ml G418. Chromosomal DNA was digested with Clal The DNA 
fragments were separated by gel electrophoresis and transfered to nitrocellulose and the Southern blot 
was probed with Pkqffia DNA. 

Besides the rDNA band of 9.1 kb an additional band of 7.1 kb of similar fluorescing intensity 
30 was observed in the transformants. This band corresponds to the linearised form of pPR2. From the 
intensity of these bands it was concluded that the copy number was about 100 - 140 copies of pPR2. 
These results are similar to those observed for pGB-Ph9, ruling out that the improved G418-resistance is 
due to differences in copy number of integrated vectors alone. It is not known whether the multiple copy 
event is caused by multiple copy integration of pPR2 or by the amplification of a single copy in the 
35 rDNA or a combination of both events. 



15 



20 



Example 7 

Construction of pPR2T bv cloning the GAPDH-terminator into pPR2 
Eukaryotic mRNAs contain modified terminal sequences, specificaly the 3' terminal poly(A). As 

40 the prokaryotic gene encoding G418 resistance lacks eukaryotic termination signals, which might effect 
proper transcription termination and mRNA stability (1994, Raue, H.A., TIBTECH 12: 444-449), a part 
of the 3' non-coding sequence of GAPDH was introduced. 

To that end, a 307 bp fragment, consisting of 281 bp of the 3* non-coding region of GAPDH and other 
additional cloning sequences, was amplified by PCR using the oligo's 5137 and 5138 ("Experimental"). 

41 The upsuream oligo 5137 consists of the last 14 nucleotides of the coding and 17 nucleotides of the 3' 
non-coding region of GAPDH. By base substitutions of the 5th (T A) and 8th (T -> C) nucleotide 
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of the non-coding sequence a 8amH\ restriction site was introduced. In addition this fragment contains a 
Xhol and a HihdIII restriction site at its 3' end. 

The PGR fragment was purified from the PGR mixture using the WIZARD Purification Kit and 
digested with BamlU and HindWl A 288 bp fragment was isolated and cloned into the corresponding 
sites of the previously constructed Phaffia transformation vector pPR2, yielding pPR2T. 

Upon transformation of Phaffia, using G418 as selective agent, the transformation frequencies 
(number of transformanls per \ig of DNA) obtained with the improved construct pPR2T was 
approximately 5 to 10 times higher than the transformation frequency of pPR2 (i.e. without a Phaffia 
homologous transcription termination signal). The results of a typical experiment are given in Table 4. 



Table 4 Transformation frequency at 50 ue/ml G418 for pGB-Ph9. pPR2 and dPR2T 



Vector 

pGB-Ph9 (ccc) 
pGB.Ph9 (xBglll) 
pPR2 (ccc) 
pPR2 (xC/al) 
pPR2T (ccc) 
pPR2T (xClal) 
pPR2T (x5yfl) 



transformants transformants/^g DNA 

60 I 
1 

3000 - 9600 50 -160 



45600 
1080 



760 
18 



Phaffia cells transformed with pPR2T were tested for their ability to grow on high levels of 
G418. The level of G418 on which growth is still possible was taken as a measure of the expression 
level of the G418 resistance gene in transformants, as a result of the presence of the Phaffia promoter, 
and/or terminator Preliminary results indicate that the number of transformanls able to grow on high 
levels of G418 are significantly higher than without terminator. 



In summary 

From the above results, it was concluded, that the presence of the GAPDH-promoter (pPR2) 
resulted in a considerable increase of the transformation frequency (from I to at least 50 per pg of 
DNA) when compared to the vector containing the actin-promoter (pGB-Ph9). These results are in line 
with the results obtained with the G418 sensitivity test (Table 3 and 4) which indicate superior 
expression levels under the control of the GAPDH promoter. The possibility that the difference in 
transformation frequency could be due solely to the difference in linearising the vectors, (BgiU, Clal 
and .^I all cut inside the ribosomal DNA locus, but at different positions), was ruled out by comparison 
of pPK2(xSJil) with pGB-Ph9(x5yil). The difference in transfonnation frequency between the two pPR2 
and pGB-Ph9, linearised with SJtl is still considerable. However, it is concluded that the choice of the 
linearisation site does have effect on the transformation frequency; linearisation with Clal is preferred. 
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The improvements obtained by using a high-level promoter, such as GAPDH, are irrespective of 
whether a homologous terminator is used (pPR2 (without homologous terminator) performs far better 
than pGB-Ph9, both in G4 1 8 sensitivity tests, as well as in tenms of transf rmati n frequency). 

The presence of a homologous terminator results in both higher transformation frequencies and 
higher expression levels; this result is concluded to be independent of the promoter used. Preliminary 
results indicate that considerable improvements are obtained when the pGB-Ph9 construct is completed 
with a transcription terminator, such as the GAPDH-terminator used in pPR2T. 

The following Examples illustrate the isolation of DNA encoding enzymes involved in the 
carotenoid biosynthesis pathway of Phqffia rhodozyma. These DNA sequences can suitably be used for a 
variety of purposes; for example to detect and isolate DNA sequences encoding similar enzymes in other 
organisms, such as yeast by routine hybridisation procedures, to isolate the transcription promoters 
and/or terminators, which can be used to construct expression vectors for both heterologous as well as 
homologous downstream sequences to be expressed. The DNA sequences encoding carotenoid 
biosynthesis genes can suitably be used to study the over-expression, either under the conUol of their 
own promoters or heterologous promoters, such as the glycolytic pathway promoters illustrated above. 
For example, transformation of Phaffia rhodozyma with carotenoid encoding DNA sequences according 
to the invention effectively results in amplification of the gene with respect to the wild-type situation, 
and as a consequence thereof to overexpression of the encoded enzyme. 

Hence, the effect of over-expression of one or more genes encoding carotenoid biuosynthesis genes can 
thus be studied. It is envisaged that mutant Phaffia strains can be obtained producing higher amounts of 
valuable carotenoids, such as B-carotene, cantaxanthin, zeaxanthin and/or astaxanthin. Similarly, the 
DNA sequences encoding enzymes involved in the carotenoid biosynthesis pathway can be introduced 
into other hosts, such as bacteria, for example £. colL yeasts, for example species of Saccharomyces, 
Kluyveromyces, Rhodosporidium, Candida, Yarrowia, Phycomyces, Hansenula, Picchia, fungi, such as 
Aspergillus, Fusarium, and plants such as carrot, tomato, and the like. The procedures of transformation 
and expression requirements are well known to persons skilled in these arts. 

Strains: £ coli XL-B\uG'lARF'^(mcrA)I83A(mcrCB'hsdSMR'mrr) 173 endAl supE44 thi-l recAJ 
gyrA96 relAJ /ac[F* proAB laq''ZAMI5 TnlO (ToV)] 

ExAssist™ interference-resistant helper phage (Stategene*^) 

P. rhodozyma CBS6938 or 

P. rhodozyma asta 1043-3 
Plasm ids used for cloning: 

pUC19 Ap'(Gibco BRL) 

Uni-ZAP"^ XR vector (lambda ZAP*^ 11 vector digested with EcoK\-Xhol ClAP 
treated;Strategene^) 

Media: LB: 10 g/i bacto tryptone, 5 g/l yeast extract. 10 g/l NaCl Plates; +20 g/1 bacto agar. 
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When appr priate 50-100 ^g/ml ampicillin (Ap). 30 pg/ml chloramphenicol (Cm) and 

I mM isopropyl-l-thio-p-D-galactopyranoside(IPTG) was added. 

YePD: 10 g/l yeast extract, 20 g/I bacto peptone, 20 g/1 glucose. Plates; +20 g/l bacto 

agar. 

All molecular cloning techniques were essentially carried out as described by Sambrook et at. in 
Molecular Cloning: a Laboratory Manual, 2nd Edition (1989; Cold Spring Harbor Laboratory Press). 
Transformation of E. coli was performed according to the CaClj method described by Sambrook et al. 

Enzyme incubations were performed following instructions described by the manufacturer. 
These incubations include restriction enzyme digestion, dephosphorylation and ligation (Gibco BRL). 
Isolation of plasmid DNA from E. coli was performed using the QIAGEN (Westburg B.V. NL). 

For sequence analysis deletions constructs and oligonucleotides were made to sequence the 
complete sequence using a Taq DYE Primer Cycle Sequencing kit (Applied Biosystems). 

Example 8 
Descriotion of olasmids 

Plasmids (pACCAR25AcrtE» pACCAR25AcrtB, pACCRT-EIB, pACCARI6AcrtX and 
pACCAR25AcrtX), which contain different combinations of genes involved in the biosynthesis of 
carotenoid in Erwinia uredovora were gifts from Prof. Misawa; Kirin Brewery co.,LTD.; Japan). The 
biosynthetic route of carotenoid synthesis in Erwinia uredovora is shown in fig 8. 
In addition a derivative of pACCAR25AcrtX, designated pACCAR25AcrtXAcrtI, was made in our 
laboratory. By the introduction of a frameshift in the BamW resu-iction site the crti gene was 
inactivated. £. coli strains harboring this plasmid acummulate phytoene which can be monitored by the 
red phenotype of the colony. 

All plasmids are derivatives of plasmid pACYCI84 (Rose RE; Nuci. Acids Res. 16 (1988) 355), which 
contains a marker conferring chloramphenicol-resistance. Furthermore these plasmids and derivatives 
thereof contain a replication origin that is compatible to vectors such as pUC and pBluescript. Each 
plasmid contains a set of carotenoid biosynthetic genes of Erwinia uredovora mediating the formation of 
different carotenoid in £. coli. The complete list of plasmid used in this study is shown in Table 5. 



Table 5: Summary of carotenoid producing E.coli strains used in this study. 



PLASMID: 


GENOTYPE: 


CAROTENOID 
ACCUMULATED: 


COLOR 
PHENOTYPE: 


pACCAR25AcnE 


crtB: crtI; criY; 

crtX: 

crtZ 


famesy] 

pyrophosphate/iso- 
pentenyl pyrophosphate 


white 


pACCAR25AcrtB 


crtE: crtI; crtY; 

crtX; 

crtZ 


geranylgeranyl 
pyrophosphate 


white 


pACCAR25AcrtX 
Acrtl 


crtE: crtB: crtY: 
crtZ 


phytoene 


white 
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pACCRT-ElB 


cr(E: crtB; crtl 


lycopenc 


red 


pACCAR16AcrtX 


crtE: crtB; crtl 


(J-carotene 


yellow 




crtY 






pACCAR25AcrtX 


crtE: crtB: crtl; 


zeaxanthin 


yellow/ 




crtY; 




orange 




crtZ 







Genes encoding: crtE, geranylgeranyl pyrophosphate synthase; crtB, Phytoene synthase; crti, 
phytoene desaturase; crtY, lycopene cyclase; crtX, P-carotene hydroxylase; crtZ, zeaxanthin 
10 glycosylase 

Example 9 

Construction of cDNA library of Phaffta rhodozyma 

15 a) Isolation of total RNA from Phaffia rhodozyma 

All solutions were made in OEPC-treated distilled water and all equipments were soaked overnight in 
0.1% DEPC and then autoclaved. 

A 300 ml Erlemeyer containing 60 ml YePD culture medium was inoculated with Phaffia rhodozyma 
20 strain CBS6938/1043-3 from a preculture to a final OD^oo of 0.1. This culture was incubated at 21 °C 
(300 rpm) until the OD^qq had reached 3-4. 

The cells were harvest by centrifugation (4 *C, 8000 rpm, 5 min) and were resuspended in 12 ml of ice- 
cold extraction-buffer (0. 1 M Tris-HCl, pH 7.5; O.l M LiCl; 0.1 mM EDTA). After centrifugation cells 
were resuspended in 2 ml of ice-cold extraction-buffer, 4 g of glassbeads (0.25 mm) and 2 ml phenol 
25 were added. 

The mixture was vonexed 5 times at maximum speed for 30 s with 30 s cooling incubation intervals on 
ice. 

The cell/glassbeads/phenol mixture was centrifriged (5 min. 15.300 rpm , 4 °C) and the aqueous phase 
(sup I) was transfened to a fresh tube and was kept on ice. 
30 The phenolic phase was retracted by adding an additional volume of 1 ml extraction buffer and 2 ml 
phenol. 

After centrifugation (5 min, 15.300 rpm , 4 ^'C). the aquaous phase was transferred to sup 1 and 
extracted with an equal volume phenol: chloroform. 

After centriftigation (5 min, 15.300 rpm , 4 *C), the aquaous phase was transferred to a fresh tube and 
35 0.1 volume of 3 M NaAc; pH5.5 and 2.5 volumes of EtOH was added to precipitate RNA (incubation 
overnight -20 

The precipitate was collected by centrifugation (10 min, 15.300 rpm , 4 ®C) and drained off excess 
liquid and the RNA pellet was washed with 70 % icecold EtOH. 
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After removing excess liquid the RNA was resuspended in 200 - 800 ^1 DEPC-treated water. RNA was 
stored at -70 "C. A 60 ml culture yielded 400 - 1500 jig total RNA. The integrity of total RNA was 
checked by formaldehyde RNA gel electrophoresis. 

b) Selection of polv(Ar RNA 

Isolation of poly(A)* from total RNA was carried out essential as described by Sambrook et al., 1989 
(Molecular cloning, a laboratory manual, second edition) using the following solutions. 
All solutions were prepared in DEPC-treated water and autociaved. 
RNA denaturation buffer: I M NaCl; 18% (v/v) DMSO. 

Column-loading buffer (HEND): 10 mM Hepes, pH 7.6; 1 mM EDTA; 0.5 M Na CI; 9% (v/v) DMSO. 
Elution buffer (HE): 10 mM Hepes, pH 7.6; 1 mM EDTA. 

01igo(dT)-celiulose Type 7 was supplied by Pharmacia Biotech. 0.1 g (dry weight) of oligo(dT)- 
cellulose was add to 1 ml HEND and the suspension was gently shaked for I h at 4 **C. Total RNA (1.5 
mg dissolved in 500 ^1) and 1 ml i M NaCl; 18% (v/v) DMSO was heated to 65 °C for 5 rain. Then 
600 ^1 NaCI/DMSO was added to the RNA, mixed and placed on ice for 5 min. The poly(A)'' isolation 
was carried out be two cycles of purification. The final yield was about 45 \ig poly(A)* RNA. 

c) cDNA synthesis 

cDNAs were synthesized from 7.5 ng poly(A)*-RNAs using the cDNA Synthesis Kit (#200401; 
Strategene^). Synthesis was carried out according to the instruction manual with some minor 
modification. 

Superscript™ II RNase H' Reverse Transcriptase (Gibco BRL) was used in the first strand reaction 
instead of MMLV-RT. 

The following reagents were add in a microcentrifuge: 
3 nl of poly(A)* RNAs 

2 ^l of linker-primer 
23.5 ^l DMQ 

Incubate 10 min 70 ^C, spin quickly in microcentrifuge and add, 
10 ^l of 5 X First Strand Buffer (provided by Gibco BRL) 
5 ^l of O.I M DTT (provided by Gibco BRL) 

3 ^1 of first strand methyl nucleotide mixture 

1 pi of RNase Block Ribonuclease Inhibitor (40 U/^1) 
Annealling of template and primers by incubation the mixture at 25 'C for 10 min followed by 2 min at 
42 "C and finally add; 

2.5 |il Superscript^ II RNase H' Reverse Transcriptase 
First-strand reaction was carried out at 42 °C for 1 h. 
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Size fractionation was carried out using Geneclean*^ 11 kit ( supplied BIO 10 K inc.)- The volume of the 
cDNA mixture obtained after ^Wiol digestion was brought up by adding DMQ to a final volume of 200 
Three volumes of Nal was added and the microcentrifuge tube was placed on ice for 5 min. The 
pellet of glassmilk was washed three times using 500 ^l New Wash. Finally the cDNA was eluted in 20 
^il DMQ. 

The yield of cDNA was about 1 ^g using these conditions. 
d) cDNA cloning 

cDNA library was constructed in the Uni-ZAP™ XR vector using 100 ng cDNAs. Ligation was 
performed two times overnight incubation at 12 *C. The cDNA library was packaged using the 
Packagene*^ lambda DNA packaging system (Promega) according to the instruction manual. The 
calculated liter of the cDNA library was 3.5 10** pfu. 

c) Mass excission 

Mass excision was carried out described in the protocol using derivatives of £. coii XL-Blue-MRF* as 
acceptor strain (see Table 5). Dilution of cell mixtures were plated onto 145 mm LB agar plates 
containing ampicillin, chloramphenicol and IPTG, yielding 250 - 7000 colonics on each plate. The plates 
were incubatied overnight at 37 *C and further incubated one or two more days at room temperature. 

Example 10 

Cloning of the geranvlceranvl pyrophosphate synthase eene icrtE) of Phaffta rhodozvma 

a) Isolation of cDNA clone 

The entire library was excised into a famesylpyrophosphate/ isopentenyl pyrophosphate accumulating 
cells of E.co!i XL-Blue-MRF, which carries the plasm id pACCAR25AcrtE (further indicated as XL- 
Blue-MRF'[pACCAR25AcrtE]). The screening for the cr(E gene was based on the color of the 
transformants. Introduction of the crtB gene in a genetic background of XL-Blue- 
MRF'[pACCAR25AcrtE] would result in a restoration of the complete route for the biosynthesis of 
zeaxanthin-diglucoside^ which could be monitored by the presence of a yellow/orange pigmented colony. 
About 8.000 colonies were spread on LB agar plates containing appropriate antibiotics and IPTG. One 
colonie was found to have changed to a yellow/orange color. 

b) Characterization of complementing cDNA clone 

These colonies were streaked on LB-ampicillin agar plates. Plasmid DNA was isolated from this yellow 
colonics and found to include a 1.85 kb fragment (Fig 2A). The resulting plasmid. designated pPRcrtE, 
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was used for retransformation experiments (Table 6). Only the transf rmation of XL-Blue- 
MRF*[pACCAR25AcrtE] with pPRcrtE resulted in a white to yell w col r change in phenotype. To test 
whether the color change was due to c mplemention and not caused by cDNA alone pPRcrtE was 
transformed into XL-Bluc-MRF\ Selection of transformants on LB-ampicillin agar plate containing 
IPTG did not result in color changes of the colonies (Table 6). Therefore we tentatively concluded, that 
we have cloned a cDNA of P. rhodozyma encoding GPPP synthase which is involved in the conversion 
of IPP and FPP to GGPP. 



Table 6: Color phenotype of carotenoid producing £. coU strains transformed with pPRcrtE. 





pUC19 (control) 


pPRcrtE 


XL-Blue-MRF* 
(Ap, IPTG) 


white 


white 


XL-BIue-MRF' 
(pACCAR25AcrtE] 
(Ap, Cm, IPTG) 


white 


yellow/orange 


XL-Blue-MRF* 
[pACCAR25AcrtB] 
(Ap. Cm, IPTG) 


white 


white 



Transformation: 10 ng of each plasmid was mixed to CaCK competent £. coli cells. 
Transforment cells were selected by plating 1/10 and l/IOO volume of the DNA/cell mixture on 
LB agar-medium containing the appropriate antibiotics (in brackets). 



c) Sequence analysis of cDNA fragment 

Plasmid pPRcrtE was used to determine the nucleotide sequence of the 1 .85 kb cDNA. 
The sequence comprised 1830 nucleotides and a 31 bp poly(A) tail. An open reading frame (ORF) of 
375 amino acids was predicted. The nucleotide sequence and deduced amino acid sequence are shown as 
SEQIDNO; NO 14 and 15, respectively. A search in SWISS-PROT protein sequence data bases using 
the Blitz amino acid sequence alignment program indicated amino acid homology (52 % in 132 aa 
overlap; Neurospora crassd) especially to the conserved domain I in geranylgeranyl-PPi synthase 
enzymes of different organisms (Botella et al, Eur. J. Biochem. (1995) 233; 238-248). 

Example \ \ 

Cloning of the phvtoene synthase gene icrtB) of PhafTia rhodozvma 
a) Isolation of cDNA clone 

The entire library was excised into a geranylgeranylpyrophosphate accumulating cells of E.coli XL-Blue- 
MRF\ which carries the plasmid pACCAR25AcrtB (further indicated as XL-Blue- 
MRF'[pACCAR25AcrtB]). The screening for the crtB gene was based on the color of the transformants. 
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Introduction of the crtB gene in a genetic background of XL-Blue-MRF'[pACCAR25AcrtB] would result 
in a restoration of the complete route for the biosynthesis of zeaxanthin-diglucoside, which could be 
monitored by the presence of a yellow/orange pigmented colony. 

About 25.000 colonies were incubated on LB agar plates containing appropriate antibiotics and IPTG. 
Three colonies were found to have changed to a yellow/orange color. 

b) Characterization of comolementing cDN A clone 

These colonies were streaked on LB-ampicilUn agar plates. Plasmid DNA. designated pPRcrtBl to 3, 
was isolated from these yellow colonies and found to include a 2.5 kb fragment (Fig 2B). One of the 
resulting plasmids, pPRcrtBl was used for retransformation experiments (Table 7). Only the 
transformation of XL-Blue-MRF'[pACCAR25AcrtB] with pPRcrtB resulted in a white to yellow color 
change in phenotype. Therefore we tentative conclude that we have cloned a cDNA of /*, rhodozyma 
encoding phytoene synthase which is involved in the conversion of 2 GGPP molecules via prephytoene 
pyrophosphate into phytoene. 



Table 7: Color phenotype of carotenoid producing £. coli strains transformed with pPRcrtB. 





pUC 19 (control) 


pPRcrtB 


XL-Blue-MRF* 
(Ap, IPTG) 


white 


white 


XL-Blue-MRF' 
[pACCAR25AcrtB 
(Ap, Cm, IPTG) 


white 


yellow/orange 


XL-Blue-MRF^ 
[pACCAR25AcrtE 
(Ap, Cm, IPTG) 


white 


white 



Legend: see Table 6, 



c) Sequence analysis of cDNA fragment. 

Plasmid pPRcrtB2, which contains the longest cDNA insert, was used to determine the nucleotide 
sequence of the 2.5 kb cDNA. The sequence comprised 2483 nucleotides and a 20 bp poly(A) tail. An 
open reading frame (ORF) of 684 amino acids was predicted. The nucleotide sequence and deduced 
amino acid sequence are shown in SEQIDNOs: 12 and 13, respectively. A search in SWISS-PROT 
protein sequence data bases using the Blitz amino acid sequence alignment program Data indicated some 
amino acid homology (26 % identity in 441 aa overlap of cr/B gene of Neurospora crassa) with crtB 
genes of other organisms. 

Example 12 

Cloning of the phvtoene desaturase gene (crtD of Fhaffia rhodozvma 
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a}_ Isolation of cDNA clone 

The entire library was excised into a phytoene accumulating cells of Exoli XL-Blue-MRF', which 
carries the plasniid pACCAR25AcilXAcrtI (further indicated as XL.BIue-MRF'[pACCAR25AcrtXAcrtI]), 
The screening for the cril gene was based on the color of the transfomants. Introduction of the cnl gene 
in a genetic background of XL-Blue-MRF'[pACCAR25AcrtXAcrtI] would result in a restoration of the 
complete route for the biosynthesis of zeaxanthin, which could be monitored by the presence of a 
yellow/orange pigmented colony. 

About 14.000 colonies were incubated on LB agar plates containing appropriate antibiotics and IPTG. 
Two colonies were found to have changed to a yellow/orange color. 

b) Characterization of complementing cDNA clones 

These colonies were streaked on LB-ampicillin agar plates. Plasmid DNA, designated pPRcrtl.l and 
pPRcrtl.2, was isolated from these yellow colonies and found to include a 2.0 kb fragment (Fig 2C). 
One of the resulting plasmids, pPRcrtl.l was used for retransformation experiments (Table 8). Only the 
transformation of XL-BIue-MRF*IpACCAR25AcrtXAcrtl] with pPRcrtl resulted in a white to yellow 
color change in phenotype. Therefore we tentative conclude that we have cloned a cDNA of P. 
rhodozyma encoding phytoene desaturase which is involved in the conversion of phytoene to lycopene. 



Table 8: Color phenotype of carotenoid producing £. coU strains transformed with pPRcrtl. 





pUCI9 


pPRcrtl 


XL-BIue-MRF' 
(Ap, IPTG) 


white 


white 


XL-Blue-MRF' 
lpACCAR25AcrtX 

Acrtl 
(Ap. Cm, IPTG) 


white 


yellow/orange 


XL-Blue-MRF' 
(pACCAR25AcrtB 
(Ap, Cm, IPTG) 


white 


white 



Legend: see Table 6. 



c) Sequence analysis of cDNA fragment 

One of the plasmid pPRcrtl was used to determine the nucleotide sequence of the 2.0 kb cDNA. The 
sequence comprised 2038 nucleotides and a 20 bp poly(A) tail. An open reading frame (ORF) of 582 
amino acids was predicted. The nucleotide sequence and deduced amino acid sequence are shown in 
SEQIDNOs: 16 and 17, respectively. A search in SWISS-PROT protein sequence data bases using the 
Blitz amino acid sequence alignment program Data indicated amino acid homology to phytoene 
desaturase gene of N. crassa (53% identity in 529 aa overlap). 



SUBSTITUTE SHEET (RULE 26) 



wo 97/23633 



31 



PCT/EP96/05887 



Example 13 

CioninE of the Ivcopene cyclase gene icrtY) of Phaffta rhodozvma 
al Isolation of cDNA clone 

The entire library was excised into a lycopene accumulating cells of Exoli XL-BIue-MRF', which 
carries the plasmid pACCRT-EEB (further indicated as XL-Blue-MRF'[pACCRT-EIB]). The screening 
for the crtY gene was based on the color of the transfoimants. Introduction of the crtY gene in a genetic 
background of XL-Blue-MRF'[pACCRT-EIB] would result in a restoration of the complete route for 
the biosynthesis of p-carotene, which could be monitored by the presence of a yellow pigmented colony. 
About 8.000 colonies were incubated on LB agar plates containing appropriate antibiotics and IPTG. 
One colony was found to have changed to a yellow color. 

b) Characterization of complementing cDNA clone 

This colony was streaked on LB-ampicillin agar plates. Plasmid DNA was isolated from this yellow 
colony and found to include a 2.5 kb fragment (Fig 2B). The resulting plasmid, designated pPRcrlY, was 
used for retransformation experiments (Table 9, Surprisingly, not only transformation of XL-Blue- 
MRF*[pACCRT-EIB] but also transformation of XL-Blue-MRF'[pACCAR25AcrtB] with pPRcrtY 
resulted in a red to yellow color change in phenotype. 



Table 9: Color phenotype of carotenoid producing £. coli strains transfonned with pPRcrtY. 





pUCl9 


pPRcrtB 


XL-Blue-MRF' 
(Ap, IPTG) 


white 


white 


XL-Blue-MRF' 
[pACCRT-ElB 
(Ap, Cm, IPTG) 


red 


yellow 


XL-Blue-MRF' 
[pACCAR25AcrtB 
(Ap, Cm, IPTG) 


red 


yellow 



Legend: see Table 6. 



A second transformation experiment was carried out including the previously cloned cDNA of pPRcrtB. 
As shown in table 6 the cDNA previously (example 3) isolated as encoding phytoene synthase was able 
to complement the crtY deletion resulting in the biosynthesis of p-carotene in XL-Blue-MRF*[pACCRT- 
EIB]. 
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Sequence analysis of the cDNA insert f pPRcrtY (SEQlDNOs: 18 and 19) showed that it was similar to 
the sequence of cDNA fragment of pPRcrtB. 

From these data we tentative conclude that we have cloned a cDNA of P, rhodozyma encoding phytoene 
synthase and lycopene cyclase which is involved in the conversion of 2 GGPP molecules via 
prephytoene pyrophosphate into phytoene and lycopene to P-carotene, respectively. This is the first gene 
in a biosynthetic pathway of carotenoids synthesis that encodes two enzymatic activities. 

Table 10: Color phenotype of carotenoid producing £. coli strains transformed with different 
cDNAs of Phaffia rhodozyma (Ap, Cm, IPTG). 





pUCl9 


pPRcrtE 


pPRcTtB 


pPRcrtY 


XL-Blue-MRF^ 
(pACCAR25AcrtE] 


white 


yellow/ 
orange 


white 


white 


XL-Blue-MRF' 
IpACCAR25AcrtB] 


white 


white 


yellow/ 
orange 


yellow/ 
orange 


XL-Blue-MRF' 
[pACCRT-EIB] 


red 


red 


yellow 


yellow 



Legend: see Table 6 



Example 14 

Cloning o f the isopentenvl diphosphate (IPP) isomerase gene (idtS of PhafTia rhodozyma 

a) Isolation of cDNA clone 

The entire Phaffia cDNA library was excised into lycopene accumulating cells of E.coii XL-Blue-MRF\ 
each carrying the plasmid pACCRT-EIB (further indicated as XL-Blue-MRF'[pACCRT-EIB]). 
About 15.000 colonies were incubated on LB agar plates containing appropriate antibiotics and IPTG. 
One colony was found to have a dark red colour phenotype. 

b) Characterization of complementing cDNA clone 

This colony was streaked on LB-ampicillin agar plates. Plasmid DNA was isolated from this yellow 
colony and found to include a 1 .1 kb fragment, the resulting plasmid, designated pPRcrtX, was used for 
retransformation experiments (Table II). 

All colonies of XL-BIue-MRF'[pACCAR-EIB] transformed with pPRcrtX had a dark red phenotype. 
From these data we tentatively concluded, that we have cloned a cDNA of P. rhodozyma expression of 
which results in an increased lycopene production in a genetically engineered £. coli strain. 

Table 1 1 : Color phenotype of carotenoid producing £. coli strains transformed with pPRcrtX. 
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pUC19 


pPRcrtX 


XL-Blue-MRF' 
(Ap. IPTG) 


white 


white 


XL-BIue-MRF' 
[pACCRT-EIB 
(Ap, Cm, IPTG) 


red 


dark red 



Legend: see Table 6. 



c) Sequence analysis of cDNA fragment 

In order to resolve the nature of this gene the complete nucleotide sequence of the cDNA insert in 
pPRcrtX was determined. The nucleotide sequence consist of the 1 144 bp. The sequence comprised 1 126 
nucleotides and a poly(A) tail of 18 nucleotides. An open reading frame (ORF) of 251 aminoacids with 
a molecular mass of 28.7 kDa was predicted. The nucleotide sequence and deduced amino acid sequence 
are shown in SEQIDNOs: 20 and 21, respectively. 

A search in SWISS-PROT protein sequence data bases using the Blitz amino acid sequence alignment 
program Data indicated aminoacid homology to isopentenyldiphosphatc (IP?) isomerase (idi) of S, 
cerevisiae (42,2 % identity in 200 aminoacid overlap), IFP isomerase catalyzes an essential activation 
step in the isoprene biosynthetic pathway which synthesis the 5 -carbon building block of carotenoids. In 
analogy to yeast the gene of Phaffia was called idiL The cDNA clone carrying the genes was then called 
pPR/i//. 

Example 15 

Overexpression of the idi gene of P. rhodozvma in a carotenogenic £. coii 
Lycopene accumulating cells of Exoli XL-Blue-MRF\ which carry the plasmid pACCRT-EIB 
(further indicated as XL-Blue-MRF'(pACCRT-EIB]) were transformed with pUC19 and pPRii/i and 
transformants were selected on solified LB-medium containing Amp and Cm. The transformants, called 
XL-Blue-MRr[pACCRT-EIB/pUC19 and [pACCRT-EIB/pPR/^//], were cultivated in 30 ml LB-medium 
containing Amp. Cm and IPTG at 37 'C at 250 rpm for 16 h. From these cultures 1 ml was used for 
carotenoid extraction and analysis. After centrifugation the cell pellet was dissolved in 200 ^1 aceton and 
incubated at 65 *C for 30 minutes. Fifty ^1 of the cell-free aceton fraction was then used for high- 
performance liquid chromatography (HPLC) analysis. The column (chrompack cat. 28265; packing 
nucleosil 10GC18) was developed with water-acetonitrile-2-propanol (from 0 to 45 minutes 9:10:81 and 
after 45 minutes 2:18:80) at a flow rate of 0.4 ml per minute and recorded with a photodiode array 
detector at 470 +/- 20 nm. Lycopene was shown to have a retention time of about 23 minutes under 
these conditions. The peak area was used as the relative lycopene production (mAu*s). The relative 
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lycopcne production was 395 and 1 165 for XL-Blue-MRF'[pACCRT-ElB/pUCI9] and [pACCRT- 
ElWpPRidi], respectively. 

These data show the potentials of metabolic pathway engineering in Phaffia, as increased 
expression of the idi of Phaffla rhodozyma causes a 3-fold increase in carotenoid biosynthesis in £. co//. 

This cDNA may be over-expressed in a transformed Phaffia cell with a view to enhance 
carotenoid and/or xanthophylJ levels. The cDNA is suitably cloned under the control of a promoter 
active in Phaffia, such as a strong promoter according to his invention, for example a Phaffia glykolytic 
pathway promoter, such as the GAPDH-gene promoter disclosed herein, or a Phaffia ribosomal protein 
gene promoter according to the invention (vide sub). Optionally, the cDNA is cloned in front of a 
transcriptional terminator and/or polyadenylation site according to the invention, such as the GAPDH- 
gene terminator/polyadenylation site. The feasibility of this approach is illustrated in the next example, 
where the cr/B gene from Erwinia uredovora is over-expressed in Phaffia rhodozyma by way of 
illustration. 



Example 16 

Heterologous expression of carotenogenic gene from Erwinia uredovora in Phaffia rhodozyma. 

The coding sequence encoding phytoene synthase {crtB) of Erwinia uredovora (Misawa et a!., 
1990) was cloned between the promoter and terminator sequences of the gpd (GAPDH-gene) of Phaffia 
by fusion PGR. In two separate PGR reactions the promoter sequence of gpd and the coding sequence of 
crtB were amplified. The former sequence was amplified using the primers 5177 and 5128 and pPR8 as 
template. This latter vector is a derivative of the Phaffia transformation vector pPR2 in which the 
promoter sequence has been enlarged and the BglW restriction site has been removed. The promoter 
sequence of gpd was amplified by PGR using the primers 5226 and 5307 and plasmid pPRgpd6 as 
template. The amplified promoter fragment was isolated, digested with Kpnl and BamHl and cloned in 
the KpM'BglU fragment of vector pPR2, yielding pPR8. The coding sequence of crtB was amplified 
using the primers 5131 and 5134 and pAGCRT-EIB as template. In a second fusion PGR reaction, using 
the primers 5177 and 5134, I fig of the amplified promoter and crtB coding region fragment used as 
template yielding the fusion product Pgpd-crtB. The terminator sequence was amplified under standard 
PGR conditions using the primers 5137 and 5138 and the plasmid pPRgdh6 as template. Primer 5137 
contains at the 5* end the last 1 1 nucleotides of the coding region of the crtB gene of £. uredovora and 
the first 16 nucleotides of the terminator sequence of gpd gene of P. rhodozyma. By a two basepair 
substitution a BamHl restriction site was introduced. The amplified fusion product (Pgpd-crtB) and the 
amplified terminator fragments were purified and digested with HindlW and BamHl and cloned in the 
dephosphorylated HindlW site of the cloning vector pMTL25. The vector with the construct Pgpd-crtB- 
Igpd was named pPREX 1 . 1 . 

The HindSXl fragment containing the expression cassette Pgpd-crtB-Tgpd was isolated from 
pPREXI.l and ligated in the dephosphorylated Hindlll site of the Phaffia transformation vector pPR8. 
After transformation of the ligation mixture into E. coli a vector (pPR8cr/B6. 1 ) with the coirect insert 
was chosen for Phaffia transformation experiments. 
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Phaffia strain CBS6938 was transformed with pPR8cr/56.1, carrying the expression cassene 
Vgpd-crtB'Tgpd, and transformants were selected on plates containing 0418. The relative amount of 
astaxanthin per OD^m in three G418-resistant transformants and the wild-type Phaffia strains was 
determined by HPLC analysis (Table 12). For carotenoid isolation from Phaffia the method of 
DMSO/hexane extraction described by Sedmak et ai, (1990; Biotechn. Techniq, 4» 107-1 12) was used. 



Table 12. The relative astaxanthin production in a Phaffia transformant carrying the criB gene of £. 
uredovora. 



Relative amount 






of astaxanthin 


Strain: 


(mAU*s/ODft6o) 


P. rhodozymaChSe9l% 


448 


P, rhodozyma CBS6938 




[pPR8crrB6.1)#l 


626 


[pPR8crrfl6.1]#2 


716 


[pPR8crrB6.l]#4 


726 


Primers used: 



5128: 5* caacigccafgatggtaagagtgttagag 3' 

5177: 5* cc caagctttctCEag gtacctggtgggtgcatgtatgtac3' 

5131: 5' izttfLttatggcagttggctcgaaaagV 

5134: 5' cc caagcttjgajcc gtc/agagcge£cgcrgcc3' 

5137: 5' ccaaggcctaaacgigtccctccaaacc 3* 

5138: 5' £c caagcttctcgae cttgatcagataaagatagagat3* 

5307: 5* gttgaagaagggatccttgtggatga 3* 

The gpd sequences are indicated in bold, the crtB sequences in italic, additional restriction sites for 
cloning are underlined and base substitution are indicated by double underlining. 

Example 17 

Isolation and characterization of the crtB gene of Phaffia 
It will also be possible to express the Phaffia rhodozyma gene corresponding to cr/B and 
express it under the control of its own regulatory regions, or under the control of a promoter of a highly 
expressed gene according ot the invention. The Phaffia transformation procedure disclosed herein, 
invariably leads to stably integrated high copy numbers of the introduced DNA, and it is expected, that 
expression of the gene under the control of its own promoter will also lead to enhanced production of 
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carotenoids, including astaxanthin. To illustrate the principle, a prot col is given for the cloning of the 
crtB genomic sequence, below. 

To btain the genomic crtfi-gene including expression signals the 2.5 kb BamHUXhol fragment 
was isolated from the vector pPRcrtB and used as probe to screen a cosmid library of Phaffia. 
The construction and screening of the library was carried out as described in Example 3 using the crtB 
gene as probe instead of the gapdh-gene. 

After the rounds of hybridization. 2 colonies were identified giving a strong hybridization signal 
on the autoradiogram after exposure. Cosmid DNA isolated from these colonies was called pPRgcrtB#l.l 
and pPRgcrtB#7, respectively. 

Chromosomal DNA isolated from Phaffia rhodozyma strain CBS 6938 and cosmid pPRgcrtB#7 
was digested with several restriction enzymes. The DNA fragments were separated, blotted and 
hybridized with a amino-terminal specific probe (0.45 kb Xbal fragment) of crtB under conditions as 
described before. After exposure, the autoradiogram showed DNA fragments of different length digested 
by different restriction enzymes which hybridized with the crtB probe. On the basis that no £coRI site is 
present in the cDNA clone a EcoKl fragment of about 4.5 kb was chosen for subcloning experiments in 
order to determine the sequence in the promoter region and to establish the presence of intron sequences 
in the crtB gene. A similar sized hybridizing fragment was also found in the chromosomal DNA digested 
with £coRI. The fragment was isolated from an agarose gel and ligated into the corresponding site of 
pUC19. The ligation mixture was transformed to competent £. coli cells. Plasmids with the correct insert 
in both orientations, named pPRlO.l and pPRlO.2, were isolated from the transformants. Comparison of 
the restriction patterns of pPRIO.I/pPRlO.2 and pPRcrtB digested with Xbal gave an indication for the 
presence of one or more introns as the internal 2.0 kb XbaX fragment in the cDNA clone was found to 
be larger in the former vectors. The subclone pPRlO.I was used for sequence analysis of the promoter 
region and the structural gene by the so-called primer walking approach. The partial sequence of the 
insert in show in SEQIDNO: 22. Comparison of the cDNA and the genomic sequence revealed the 
presence of 4 introns. 

Example 18 

Isolation of promoter sequen ces with hieh expression levels 
This example illustrates the the feasibility of the '*cDNA sequencing method" referred to in the 

detailed description, in order to obtain transcription promoters from highly expressed genes. 

For the isolation and identification of transcription promoter sequences from Phaffia rhodozyma 

genes exhibiting high expression levels, the cDNA library of Phaffia rhodozyma was analyzed by the 

following procedure. 

The cDNA library was plated on solified LB-medium containing Amp and 96 colonies were 
randomly picked for plasmid isolation. The purified plasmid was digested with Xho\ and Xba\ and 
loaded on a agarose gel. The size of the cDNA inserts varied from 0.5 to 3.0 kb. Subsequently, these 
plasmids were used as template for a single sequence reaction using the T3 primer. For 17 cDNA clones 
no sequence dau were obtained. The sequences obtained were translated in all three reading frames. For 
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each cDNA sequence the longest deduced amino acid sequences were compared with the SwissProt 
protein database at EBl using the Blitz program. For 18 deduced amino acid sequences no horn logy to 
known proteins was found whereas six amino acid sequences sh wed significant homology to 
hypothetical proteins. Fifty-five amino acid sequences were found to have significant homology to 
proteins for which the function is known. About 50 % (38/79) were found to encode ribosomal proteins 
of which 12 full-length sequences were obtained. 

Table 13. Overview of expressed cDNAs. encoded proteins and reference to the Sequence Listing 



cDNA 


coding for 


SEQIDNO: 


10 


ubiquitin-40S 


24 


11 


Glu-repr.gene 


26 


18 


40S rib.prot S27 


28 


35 


60S rib.prot PI a 


30 


38 


60S rib.prot L37e 


32 


46 


60S rib.prot L27a 


34 


64 


60S rib.prot L25 


36 


68 


60S rib.prot P2 


38 


73 


40S rib.prot S17A/B 


40 


76 


40S rib.prot S31 


42 


78 


40s rib.prot SIO 


44 


85 


60S rib.prot L37A 


46 


87 


60S rib.prot L34 


48 


95 


60S rib.prot S16 


50 



By sequence homology it was concluded that in Phaffia the 40S ribisomal protein S37 is fused to 
ubiquitin as is found in other organisms as welt. The nucleotide sequences and deduced amino acid 
sequences of the full length cDNA clones are listed in the sequence listing. Six ribosomal proteins were 
represented in the random pool by more than one individual cDNA clone. Hie 40S ribosomal proteins 
SIO (SEQIDNO:44). S37 (+ ubiquitin) (SEQIDNO:24) and S27 (SEQIDNO:28) were represented twice 
and 60S (acidic) ribosomal proteins P2 (SEQ1DN0:38), L37 (SEQIDNO:46) and L25 (SEQIDNO:36) 
found three times. From these results we conclude, that these proteins are encoded by multiple genes or 
that these genes are highly expressed. Therefore isolation of these promoter sequences are new and 
promissing target sequences to isolate high level expression signals from Phaffia rhodozyma. 
FuitheTmore, a cDNA clone was isolated which showed 50 % homology to an abundant glucose- 
repressible gene from Neurospora crassa (Cunr. genet. 14: 545-551 (1988)). The nucleotide sequence and 
the deduced amino acid sequence is shown in S£QIDNO:26. One of the advantages of such a promoter 
sequence is that it can be used to separated growth (biomass accumulation) and gene expression (product 
accumulation) in large scale Phaffia fermentation. 
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For the isolation of the promoter sequences of interest (as outlined above) a fragment from the 
corresponding cDNA clone can be used as probe to screen the genomic library of Phaffia rhodozyma 
following the approach as described for the GAPDH-gene promoter (Example 3. supra). Based on the 
determined nucleotide sequence of the promoter, specific oligonucleotides can be designed to construct a 
transcription fusion between the promoter and any gene of interest by the fusion PGR technique, 
following the procedure as outlined in Example 5 (supra). 

Example 19 

Isolation of carotenoeenic genes by heterologous hybridization 

For the identification and isolation of corresponding carotenoid biosynthetic pathway genes from 
organisms related to Phaffia rhodozyma heterologous hybridization experiments were carried out under 
conditions of moderate stringency. In these experiments chromosomal DNA from two carotenogenic 
fungi (Neurospora crassa and Blakeslea trispord) and the yeasts S, cerevisiae and three yeast and fungal 
species from the genus Cystojylobasidium was used. These three carotenogenic yeasts are, based on 
phylogenetic studies, the ones most related to P. rhodozyma. 

Chromosomal DNA from the yeast species Cystojylobasidium infirmo-miniatum (CBS 323), C 
bisporidii (CBS 6346) and C. capitatum (CBS 6358) was isolated according the method as developed for 
Phaffia rhodozyma, described in example 3 of European patent application 0 590 707 AI; the relevant 
portions of which herein incorporated by reference. Isolation of chromosomal DNA from the fungi 
Neurospora crassa and Blakeslea trispora was essentially carried as described by Kolar et al. (Gene, 62: 
127-134), the relevant parts of which are herein incorporated by reference. 

Chromosomal DNA (5 ^g) of C. infirmo-miniatum, C. bisporidii, C. capitatum, S. cerevisiae, P. 
rhodozyma, N, crassa and B. trispora was digested using EcoRl. The DNA fragments were separated on 
a 0.8% agarose gel, blotted and hybridized using the following conditions. 

Hybridization was carried out at two temperatures (50 **C and 55 °C) using four different ^^P 
labelled Phaffia probes. The probes were made using random primed hexanucleotide labellings reactions 
using the XhoUJCbal fragment(s) from the cDNA clones pPRcrtE, pPRcrtB. pPRcrtI and pPRidi as 
template. Hybridization was carried out o/n (16 h) at the indicated temperatures. After hybridization the 
filters were washed 2 times for 30 min. at the hybridization temperatures using a solution of 3*SSC; 0.1 
% SDS; 0.05% sodiumpyrophosphate. Films were developed after exposure of the filters to X-ray films 
in a cassette at -80 °C for 20 h. 

Using the cDNA clone of crtE of P. rhodozyma faint signals were obtained for C. infirmo- 
miniatum, C. capitatum. Using the cDNA clone of crtB of P. rhodozyma strong signals were obtained to 
the high molecular weight portion of DNA from C infirmo-miniatum and C. capitatum. Furthermore a 
strong signal was obtained in the lane loaded with digested chromosomal DNA from B. trispora. Only a 
faint signal was obtained for C capitatum at 50 **C using the cDNA clone of crti of P. rhodozyma. 
Using the cDNA clone of idi of P. rhodozyma faint signals were obtained with chromosomal DNA from 
C. infirmo-miniatum, C. bisporidii and C capitatum at both temperatures. A strong signal was obuined 
in the lane loaded with digested chromosomal DNA from B. trispora. 
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We conclude, that carotenoid biosynthesis cDNAs or genes, or idi cDNAs or genes, can be 
isolated from other organisms, in particular from other yeast species by cross-hybridisation with the 
cDNA fragments coding for P. rhodozyma carotenoid biosynthesis enzymes, or isopenienyl 
pyrophosphate isomerase coding sequences respectively, using moderately stringent hybridisation and 
washing conditions (50 to 55 *C, 3xSSC). 



Deposited microorEanisms 

E. coii containing pGB-Ph9 has been deposited at the Centraal Bureau voor Schimmelcultures, 
Oosterstraat 1, Baam, The Netherlands, on June 23, 1993, under accession number CBS 359.3. 
The following strains have been deposited under the Budapest Treaty at the Centraal Bureau voor 
Schimmelcultures, Oosterstraat 1, Baam, The Netherlands, on February 26, 1996: 



ID nr. 


Organism 


relevant feature 


Deposit number 


DS31855 


E. coli 


cr/Y of P. rhodozyma 


CBS 232,96 


DS3I856 


E. coli 


cril of P. rhodozyma 


CBS 233.96 


DS31857 


E. coli 


crtE of P. rhodozyma 


CBS 234.96 


DS31858 


E. coli 


criB of P. rhodozyma 


CBS 235.96 
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SBCqEMCE LISTING 



(1) GE35ERAL INFDRMMnCN; 

(i) APPLICANT: 

(A) KAMB: Gist-brxxades fi.V. 

(B) Sl^^HErt: Wateringseweg 1 

(C) cm: Delft 

(E) CX3UNTRY: The Netherlands 

(F) POSTAL C3CECE (ZIP) : 2611 XT 

(ii) TnT£ OF BJVENnCN: Itrproved nethods for ttansforndng Pbaffia and 

reoorrbinant IXOV for vise therein 

(ill) NinSEK OF SBQUEXTES: 51 

(iv) OCMFUTEK R£ADABI£ FORM: 

(A) MEDICH TiPEz floppy disk 

(B) OCr^EUITO: IBM PC conpatible 

(C) OPHWTING SYSTQi: PC-DOS/MS-DOS 

(D) SOPIWRRE: Patentin Release #1.0, Version #1.25 (EPO) 

(v) CURRENT APPLICftTICN DMA: 
APPLICATICN NtMHER: 

(2) INPORMATICK FOR SBQ ID ND:1; 

(i) SBQUEKCE CHARACI£3U:snCS : 

(A) USXSIR: 25 beise pairs 

(B) TYPE: nucleic acid 

(C) STRAMDEIKESS : single 

(D) TDPOUXSf: linear 

(ii) M3tECULE TYPE: ENA (genonic) 

(iii) HyPUIlib'X'lCAL: NO 

(vi) QRIGIKAL SCURCE:: 

(C) INDIVUXM. ISOIAIE: AB3005 



(xi) DESCRIPnCK: SEQ ID N0:1: 

QQQGAICCAA RCINM^iXSW AHTSGC 25 
(2) INPORMAnCN FOR SEQ ID N0:2: 

(i) SStJJEXfO^ CHARACIERISnCS : 

(A) l£N3rR: 32 base pairs 

(B) T£PE: nucleic acid 

(C) S71WNDSXSSS: single 

(D) TDPOLOGV; linear 

(ii) f^dSXJJLE TYPE: CNA (genoodc) 
(iii) HYP01HEnC3Uj: NO 

(vi) 0RIC3INRL SOURCE: 

(C) INDIVIDURL ISOIAIE: AB3006 

(ix) FEAIUiRE: 

(A) NAME/KE^: misc_feature 

(B) lOGATIC N: on e-of(12) 

(D) OTHER INPOBMKnCN: /notes "N at position 12 is 

ix3osine" 



(xi) SBC^JE^KX DESCRIPTION: SBQ IB N0:2: 
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CJGQGATCCOT ANOCVmnC RTTOTCRTRC CA 
(2) mroRMftnON PDR SEQ 3D N0:3: 

(1) SBQCENCH GIASACIBUSTICS : 

(A) LENC7IH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEINESS : single 

(D) lOPOUDGY': linear 

(ii) MDI£CWl£ TVPE: ENA (genomic) 

(iii) HVPOIHEnO^: NO 

(vi) ORIGINRL SOURCE: 

(C) INDIVlIXMi ISOLATE: AB4206 



(xi) SEJJENCE OSSGRimai: SEQ ID N0:3: 
GCXalGACITC raXXMOCA OSATAGC 27 
(2) INH»MATICN FOl SBQ ID ND:4: 

(i) SBQC1Q3CE ODtfiACTERISTICS : 

(A) UEZXxIH: 32 base padxs 

(B) TlfPE: nucleic acid 

(C) STKANDECtlBSS : single 

(D) TOPOLOGV: linear 

(ii) M3I£CULE TYPE: EKA (genomic) 

(iii) HYPOIHETICAL: ND 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISatATE: AB5126 



(xi) SEQUENCE DESO^IPTICN: SBQ ID N0:4: 
TKMTOCAC AIGAlULriAA GACTGTTW3A GA 32 
(2) INPDRMATIQN FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACIERISnCS : 

(A) l^NSIH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEIMESS : single 

(D) TCfPOLOS^C: linear 

(ii) M3LECUUE TYPE: CMA (genomic) 

(iii) HVPOIHEnCSa.: NO 

(vi) ORIGDIRL SCURCE: 

(C) INDIV3IXHL ISOLATE: AB5127 



(xi) SBQUaJCE DESCRIPnCN: SEQ ID ND:5: 
CTIAOCATCA TGTnQGAITCA ACAAGAIGGA T 
(2) INPORMATICN FOR SBQ ID N0:6: 

(i) SEQUENCE OiARACIERISTICS : 

(A) LENGIH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STT^AND£CNESS : single 

(D) TOEOiajf: linear 
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(ii) MDIiECmj: TVPB: ENA (gnomic) 

(iii) mVCfiaSTTCKL: NO 

(vi) CERIGINAL SCXIRCE: 

(C) INDIVIDUAL ISQLATE: AB5177 



(xi) SBQUQTCE DESCRIPnCN: SEQ ID N0:6: 
CCCAAQCrrC TOGAGCTVCC lasnGOGIGC A'iUmGEAC 40 
(2) INFQRMKnCN FDR SEQ ID N0:7: 

(i) SEQCEENCE CHARACIERISTICS : 

(A) VaXJIH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) SIT^ANDEENESS: single 

(D) TDPOLDGY: linear 

(ii) VCSLEOJIE TYPE: CHA (genomic) 

(iii) HYPOIHETICAL: ND 

(vi) ORIGINAL SCORCE: 

(C) INDIVIDUAL ISQIATC: AB5137 



(xi) SBQUEMCE DESCRIPTICN: SEQ ID N0:7: 
CXMGC30CTA AAAaaGaTOC CTOCAAACXr 30 
(2) INFORMATICN FOR SBQ ID N0:8: 

(i) SEX3UEX3CE: CHARACnPISnCS : 

(A) ISXSmi 38 base pairs 

(B) TOPE: nucleic acid 

(C) srPMJDSSJNESS: single 

(D) TOPOOJOOT: linear 

(ii) rCT^OJIE TVPE: ENA (genomic) 

(iii) HYPOrrHEnCRL: ND 

(vi) ORIGINAL SOURCE: 

(C) INDIVIDUAL ISOUVIE: AB513B 



(xi) SEQUENCE DESCRIPITCN: SBQ ID ND:8: 
QCX3tf«3cnc TOGwacrraA TCWSAIMAG MSGRGAT 38 

(2) INPCRMATICN PCR SBQ ID ND:9: 

(i) SBQUEKCB CHARAdBOSTrCS : 

(A) I£1«?IH: 2309 base pairs 

(B) 1YPB: nucleic acid 

(C) STIWICCZ21ESS: double 

(D) 10PQL£X3y: linear 

(ii) yCHECJLE TYPE: EKA (genomic) 
(iii) HyPOIHEniCAL: NO 
(iv) fiWn-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rhodozyma 

(B) SIPAIN: CBS 6938 
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(ix) FEA3URE: 

(A) IC^/KEY: exan 

(B) UXATICN: 300. .330 

(ix) FETHURE: 

(A) NAME/KEY: introri 

(B) UXMTCN: 331. .530 

(ix) FEATURE: 

(A) NftME/KEY: exan 

(B) UXATICN: 531. .578 

(ix) BEATORE: 

(A) NfiME/KEY: intrcn 

(B) LOSOTCN: 579. .668 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) UDCATICN: 669.. 690 

(ix) FEATTBE: 

(A) NM4E/KEV: intron 

(B) LOCAnCN: 691.. 767 

(ix) FEATORE: 

(A) NAME/KEY: exon 

(B) DOCAnCN: 768.. 805 

(ix) FETOURE: 

(A) NAME/KEV: intron 

(B) lOCanCN: 806.. 90S 

(ix) FEATURE: 

(A) NAME/KEy: exnn 

(B) DDCSOTCN: 906.. 923 

(ix) FEATURE: 

(A) N2*1E/KEY: intrcn 

(B) nXSOTCN: 924., 1030 

(ix) FEATURE: 

(A) NAME/KEY: excn 

(B) lOCATTCN: 1031.. 1378 

(ix) FEATURE: 

(A) NAME/KEY: intim 

(B) LCCATICU: 1379.. 1508 

(ix) FEKTURE: 

(A) NAME/KEY: excn 

(B) DXmCN: 1509.. 2020 

(ix) FE^^IURE: 

(A) N»4E/KEY: CDS 

(B) LOCATICW: join (300. .330, 531.. 578, 669.. 690, 768.. 805, 906 

..923, 1031.. 1378, 1509.. 2020) 



(xi) S^XjmCE DESaaPTICN: SEQ ID N0:9: 

GCIATGAGCA AQCaCAACIG GQCAOXSAAC G?«3AAC»GIA AL'lUiUJGfiA TCTTCCCACX: 60 

GACJ^OGS^GOC cjicrooaasc aacswaxxx: osrocxxxxx: tcoqctimg tcbgctaooc 120 

ACTrrrcrrc CAicicnrc ' ici l' iu-t i l ' cAAAftsrcrr TOCTirmA adqgcxxxxs^ iso 

aaaaaagaag AoaasAcrrT TirjiTiirrr ctocccaica tcocaaaga TdCTcncr 240 

TCAACAAC3A CTACDy:?EAC TACCACIACC AO^^CTACIT CICTAACACT CTIMCATC 299 
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ATG OCT arc AAG CTT OGA ATC AAC OCT TIC G CfmiUilj ' m ' G ' mTli:ii:r 350 
Met Ala Val Lys Val Gly He Aan Gly Phe 
15 10 

TGAGCTccoc CA'itxjb'nuT TTOQCiTOrc CRromciT Trrocmcc I ' l ' icvrxTix: 4io 

TnTITCTOC (XACIGCXTT TTITnTICT ATimTnT TmOCrnC CTCTOOCXTr 470 

CATOCRTaaC ACI3tf\CACrA TCTGATCrCA TCTCACTCIU CC I UjILTIA OCTOCTACAG 530 

GA OCSA ATC OGA OGA ATC GTC CTT OGA AAC GCT ATC ATC CPC GCJT GAT A 578 
Gly Arg He Gly Arg He Val lea Arg Asn Ala lie He His Gly Asp 
15 20 25 

CTCAGIAnT Ti'i'iAATlTC TTnTTTCOC CaTCAATITC CXTCBjC'lUJ rTOVCTCTTC 638 

TCITTCCATC TCnCrCXXAC TCTCCiaCRG TC GMT CTC GTC GOC ATC AAC GA 690 

He Asp Val Val Ala He Asn Asp 
30 

GIGOGTCTAG ATOGACXSOC TOCJIOGflCOG 00C3UttCA0C GTCIGACAOC ATOCTtjrTAA 750 

CnTTCTCTC CTCCMG C CCT TTC ATC GAT CTT GAG TAC ATS GTC TAC ATS 801 
Pro Phe He Asp Leu Glu Tyr Met Val Tyr Met 
35 40 45 

TTC A GmPOICTCC CICOCOCTCA AAAAQOOSAA ACAAAGCXX3A ACAGAAOCXIS 855 
Phe 



ATCTAACCAT TCX^ndTCT TOCXTnOCT CTiUJb ' lL ' lV TCOCTCAiaVS AG TAC 910 

Lys Tyr 

GAC TOC ACC CAC G (Ji ' ltJU ' llXa T OOCICTC I CT OroOOGAACSl TCTO0GAOCX3 963 
Asp Ser Ttir His 
50 

GGULTmUJ ATCItXTSAT OOSITCXXCT ACEAAOCCAT AOCXnADOCT TOSrOCCATC 1023 

OCnCAG GT GTC TTC AAG OGA TOC GTC GAG ATC AAG GAC OGC AA3 CTC 1071 
Gly Val Phe Lys Gly Ser Val Glu He Lys Asp Gly Lys Leu 
55 60 65 

GTC OTC GW3 OGC AAG COC OTC GTC GTC TAC GGT GAG OGA GAC OCC GCC 1119 
Val He Glu Gly Lys Pro He Val Val Tyr Gly Glu Arg Asp Pro Ala 
70 75 80 

AAC ATC CAQ T33 GGA GCT GC3C GGT GOC GAC TAC CTC GTC GAG TOC ADC 1167 
Asn He Gin Tcp Gly Ala Ala Gly Ala Asp Tyr Val Veil Glu Ser Hir 
85 90 95 

OCT GTC TTC ACC ACC CAG GPG AAG GCC GfiG CTC CAC CTC AAG OGA OGA 1215 
Gly Val Phe Thr Thr Gin Glu Lys Ala Glu Leu His Leu Lys Gly Gly 
100 105 110 

GOC AAG AAG CTC GTC ATC TOT GCC OCT TOG GOC GAT GCC COC ATG TTC 1263 
Ala Lys Lys Val Val He Ser Ala Pro Ser Ala Asp Ala Pro Met Phe 
115 120 125 130 

CTC TSC OCT Grrr AAC CTC GAC AAG TAC QAC OOC AAG TAC ACC CTC GTC 1311 
Val Cys Gly Val Asn Leu Asp Lys Tyr Asp Pro Lys Ty^^ Thr Val Val 
135 140 145 



TOC AAC GCT TOG TOC ADC ACC AAC TOC TTG OCT GOC CTC GGC AAG CTC 1359 
Ser Asn Ala Ser Cys Thr Hir Aan Cys Leu Ala Pro Leu Gly Lys Val 
150 155 160 

TO ATC CAC GAC AAC TAC ADC A GTCPOTCCTT TNCnTQGAC TTCTCTGGOC 1408 
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lie His Asp Asn Tyr Ihr 
165 

TnTcnrcr TOCjircmT cjcmTGncA aaocktocat AcrcAoxrc tititcaoct i468 

TcrmrcTr CATicAanA ttcoccctcc ojrocRCCAG tt <jic gag gct ctc is22 

He Val Glu Gly Leu 
170 

ATSAlXACTCTCCftCGCrACXrACCGOCAXO^ 1570 
Met Thr Hir Val His Ala Hir Tlur Ala Thr Gin Lys Tte Val Asp Gly 
175 180 185 

OCT TCC AAC AAG GAC T9G OGA GGA GGT CGA QGA GCT OGT GCC AAC ATC 1618 
Pro Ser Asn Lys Asp Tcp Arg Gly Gly Arg Gly Ala Gly Ala Asn He 
190 195 200 205 

ATTOCXZTCCiarACCGGAGCXrGCXrAAGGCCGrCQGrAAGG^ 1666 
lie Pro Ser Ser Thr Gly Ala Ala Lys Ala Val Gly Lys Val He Pro 
210 215 220 

TOC CrC AAC GGA AAG CTC ADC GGA AIG GC3C TIC OGA GIG C3CX: AOC CCC 1714 
Ser Leu Asn Gly Lys Leu Ihr Gly Met Ala Ete Arg Val Pro Thr Pro 
225 230 235 

GAT GTC TOC CTC CTC GAT CTT GTC CTC OSA ATC GAG AAG aC3C GCC TOT 1762 
Asp Val Ser Val Val Asp Leu Vcd Val Arg He Glu Lys Gly Ala Ser 
240 245 250 

TAG GAG GAG ATC AAG GW3 ACC ATC AAG AAG GCC TCC CAG AOC OCT GAG 1810 
Tyr Glu Glu He Lys Glu Thr He Lys Lys Ala Ser Gin Thr Pro Glu 
255 260 265 

CTC AAG GOT ATC CTG AAC TPC AOC GAC GAC CAG GIC GTC TCC AOC GKT 1858 
Leu Lys Gly He Leu Asn Tyr Thr Asp Asp Gin Val Val Ser Thr Asp 
270 275 280 285 

TIC ADC OCT GAC TCT GCC TOC TOC ADC TIC GAC GOC CAG GGC OCT ATC 1906 
Phe Thr Gly Asp Ser Ala Ser Ser Thr Jte Asp Ala Gin Gly Gly He 
290 295 300 

TCC CTT AAC OGA AAC TIC GIC CTT GTC TOC TOG TAC GAC AAC GAG 1954 
Ser Leu Asn Gly Asn Phe Val Lys Leu Val Ser Trp Tyr Asp Asn Glu 
305 310 315 

T9G GGA TAC TOT GOC OGA GIC TOO GAC CIT GIT TCT TPC AIC GOC GOC 2002 
Trp Gly Tyr Ser Ala Arg Val Cys Asp Leu Vcd Ser Tyr He Ala Ala 
320 325 330 

CAG GAC GOC AAG GOC TAAAOGGnC TCTOCAAACC C lClUXC n ' TlOXCiljCC 2057 
Gin Asp Ala Lys Ala 
335 

CKTrSAKriG ATTOOCIAAA TTtfSAATKTOC CACnTClTr TOTQCICEAC ClATGATCaG 2117 

TrmClUlC TrmCTTIG TGOSIGrCJGG TTGrGOGACT GIAOXAOCr CnGAGQGAC 2177 

A;^3GCAAGAA GIGAGCAAGA TTVTGAACAAG AACAACAAAG AAAAA3AGAC AAAGAAAAAA 2237 

AAAAGGAAAG AGAAAACAAT OOOOOOOOOC OOOCAAAAAA AAATCICERT CITIATCIGA 2297 

TCAAGAGATT AT 2309 



63 (2) IMFOlRMAnCt} PGR SEQ ID NO: 10: 

(i) SBQUEUCE OlARACIEIRISTICS : 

(A) I^XTIH: 338 amino acids 

(B) TYPE: amino acid 
70 (D) TOPOaJOCy: linear 
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(ii) NDIECXILE TYPE: protein 

(xi) SEQUENCE DESCRIPTICM: SEQ ID ND:10: 

5 Met Ala Val Lys Val Gly He Asn Gly Phe Gly Arg He Gly Arg He 
15 10 15 

Val Leu Arg Asn Ala He He His Gly Asp He Asp Val Val Ala He 
20 25 30 

10 

Asn Asp Pro Phe He Asp Leu Glu Tyr Met VaLL Tyr Met Ete Lys Tyr 

35 40 45 

15 Asp Ser Ite His Gly Val Phe Lys Gly Ser Val Glu He Lys Asp Gly 
50 55 60 

Lys Leu Val He Glu Gly Lys Pro He Val Val Tyr Gly Glu Arg Asp 
65 70 75 80 

20 

Pro Ala Asn He Gin Tcp Gly Ala Ala Gly Ala Asp Tyr Val Val Glu 
85 90 95 

Ser Ttir Gly Val Phe Ihr Thr Gin Glu Lys Ala Glu Leu His Leu Lys 
25 100 105 110 

Gly Gly Ala Lys Lys Val Val He Ser Ala Pro Ser Ala Asp Ala Pro 
US 120 125 

30 Met Phe Val Cys Gly Val Asn Leu Asp Lys lyr Asp Pro Lys Tyr Thr 
130 135 140 

Val Val Ser Asn Ala Ser Cys Thr Ihr Asn Cys Leu Ala Pro Leu Gly 
145 150 155 160 

3S 

Lys Val He His Asp Asn Tyr Thr He Val Glu Gly Leu Met Ihr Thr 
165 170 175 

Val His Ala Ihr Thr Ala Thr Gin Lys Thr Val Asp Gly Pro Ser Asn 
ao 180 185 190 

Lys Asp Itp Arg Gly Gly Arg Gly Ala Gly Ala Asn He He Pro Ser 
195 200 205 

45 Ser Thr Gly Ala Ala Lys Ala Val Gly Lys Vad He Pro Ser Leu Asn 
210 215 220 

Gly Lys Leu Thr Gly Met Ala Phe Arg Val Pro Thr Pro Asp Val Ser 
225 230 235 240 

50 

Val Val Asp Leu Val Val Arg He Glu Lys Gly Ala Ser Tyr Glu Glu 
245 250 255 

He Lys Glu Thr He Lys Lys Ala Ser Gin Thr Pro Glu Leu Lys Gly 
55 260 265 270 

He Leu Asn Tyr Thr Asp Asp Gin Val Val Ser Thr Asp Phe Thr Gly 
275 280 285 

60 Asp Ser Ala Ser Ser Thr Phe Asp Ala Gin Gly Gly He Ser Leu Asn 
290 295 300 

Gly Asn Phe Mai Lys Leu Val Ser Trp Tyr Asp Asn Glu Tcp Gly Tyr 
65 305 310 315 320 

Ser Ala Arg Val Cys Asp Leu Val Ser Tyr He Ala Ala Gin Asp Ala 
325 330 335 

TO Lys Ala 
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(2) INPOraMTCN FOR SBQ ID NO: 11: 

(i) S£3QUENCB CHARACIERISTCCS : 

(A) VSNGIH: 388 base pairs 

(B) TYPE: nucleic acid 

(C) SITIANDECMESS: double 

(D) TOPQLOGV: linear 

(ii) MDUgCTJTiF TYPE: ENA (genomic) 
(iii) HYPL?lHLTiC3ttj: NO 
(iv) ANTI-SENSE: NO 

(vi) ORIGINRL SOURCE: 

(A) ORGANISM: Phaffia rhock^zyma 

(ix) FEATURE: 

<A) NAME/KEiT: promoter 

(B) IXXTOTCN:!. .385 

(ix) FEAIUIRE: 

(A) NAME/KEY: T3Vm_signal 

(B) UCX3OTCN:249..263 

(D) OTHER INPORWVnCN: /labels putative 

(ix) FEATURE: 

(A) NAME/KEV: Tnisc_signal 

(B) IIX3aTCN:287..302 

(D) OTHER IllFCRMATICN:/£uncticn= "cap-signal" 
/labels putative 

(ix) FEATURE: 

(A) NT^/KEY: misc_RNA 

(B) L0CAnCN:386. .388 

(D) OIHER INFOI!MATTCN:/functiGn= "start of CDS" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) 1J0C3OTCN:85 

(D) OIHER INFCRMAnCK:/note= "uncertain" 

(Xi) SBCUENCE DESCRIPnCN: SBQ ID NO: 11: 
TGQIt3QC?IGC AlUrAlUJl!AC GTIGAOTGAGT GGQGGGGAAA OGCGAGIACG TXJlXmCTEPCG 60 
OQCAAGGAAG AACAAOQAAG CCXIMXSCTPa: GAGCAAGCAC AACTQQSCAC OSAAOSAGAA 120 

CAGmAciCT aoGrrATcmc ocaoogacac gaqgostuic aoaaoaocAA oogooggtoc lao 
ooaocTtxsac ttaogtcagc CAOCxsy^rrr Tcrroc3\rcT crncicicr ccttccaaaa 240 
gtcittcact ttiaaaoqcx: coccaaaaaa ;«3RflC3AGQa3 ACXTmcrr 'lojnv ivixj 300 

OCATCRroCA CAAAGAICTC TCITCnCAA CAACAACIAC TJOACEAOC ACHAOCACCA 360 
CTACTTCICr AAOUL'lXjriA OCATCAIG 388 

(2) INFORMAnCN FOK SEQ ID ND:12: 

(i) SEQUENCE CHRRACIERISTICS : 

(A) I£NGIH: 2546 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEINESS: double 

(D) TDPOIOGY: linear 

(ii) MaEBCUl£ TYPE: CLNA 
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(iii) HYPOIHEmCAL: NO 

(iv) Atm-SEMSE: NO 

i (vi) C3RIGINAIi SOURCE: 

(A) ORGANISM: Phaffia rhcxiozyina 

(ix) FEATURE: 

(A) NAME/KEY: CVS 
) (B) U3CATICN: 225. .2246 

(D) OTHER INPORMRnCN: /product^ "PRcrtB" 

(xi) SBQCJEUCE QESQUPnCN: SBQ ID N0:12: 

i TCn3U3AACm ODQGATCCXr OOGGCrGO^ GAATTOGGCA OSAGOQGAAA CAAGAAf?IGG 60 

ACTOiGAGAG A^TCnTGCIG AASACHTOIA TICCAGAAAG OSAAAACAAA QGAAAGAAGC 120 

GCX3GAAGCAC ATOAOCAACT TCAOCAAOOC GCaTOCAGOCX! GP^CCTCCXSKT AGACATCATC 180 

0 

TTODOCAACr OSDOCATOC CXMCMATA GACTnTIGT aCX:A ATS AOS OCT CTC 236 

Met Thr Ala Leu 
1 

5 GCA TKT TAC CSG ATC CAT CTC A3C TKT ACT CTC CCA ATT CTT OGT CTT 284 
Ala lyr Tyr Gin lie His Ijeu lie Tyr Ihr Leu Pro lie Leu Gly Leu 
5 10 15 20 

CTC GGC CTG CTC ACT TOC 003 ATT TIG ACA AAA TTT GAC ATC TPC AAA 332 
io Leu Gly Leu Leu Thr Ser Pro He Leu Tte Lya Phe Asp lie lyr Lys 
25 30 35 

iOA TOG Arc CTC GfIA TIT ATT GOG TTT ACT GCA ADC ACA OCA TOG GAC 380 
He Ser He Leu Val Phe He Ala Phe Ser Ala Thr Thr Pro Tip Asp 
13 40 45 50 

TCA TOG ATC ATC AGA AAT GGC GCA TOG ACA TAT OCA TCA GOG GAG ACT 428 

Ser Tcp He He Arg Asn Gly Ala Trp Thr lyr Pro Ser Ala Glu Ser 

55 60 65 

10 

GGC CAA GCaC GIG TTT GGA ADG TIT CIA GAT CTT OCA TAT GAA GAG TAC 476 

Gly Gin Gly Val Phe Gly Thr Phe Leu Asp Val Pro Tyr Glu Glu Tyr 

70 75 80 

4S GCT TIC TTT GTC KIT CAA ADC GIA ATC ADC GGC TTC GTC TAC GTC TTG 524 
Ala Phe Phe Val He Gin Thr Val He Thr Gly Leu Val Tyr Val Leu 
85 90 95 100 

GOV ACT AGQ CAC CIT CTC OCA TCT CVC GOG CTT OOC AAG ACT AGA TOS 572 
30 Ala Thr Arg His Leu Leu Pro Ser Leu Ala Leu Pro Lys Thr Arg Ser 
105 110 115 

TOC GOC err TCT CTC QOG CTC AAG QOG CTC ATC OCT CTG OOC ATT ATC 620 
Ser Ala Leu Ser Leu Ala Leu Lys Ala Leu He Pro Leu Pro He He 
55 120 125 130 

TAC CIA TTT ADC OCT CAC OOC AGO 0C3V TOG OOC GAC COG CIC GIG ACA 668 
Tyr Leu Phe Thr Ala His Pro Ser Pro Ser Pro Asp Pro Leu Val Thr 
135 140 145 

60 

GKC CAC TAC TTC TAC MG OGG GCA CTC TOC TTA CTC ATC ADC OCA OCT 716 
Asp His Tyr Phe Tyr Met Arg Ala Leu Ser Leu Leu He Thr Pro Pro 
150 155 160 

65 ADC ATC CIC TTC GCA GCA TTA TCA GGC GAA TAT OCT TTC GAT TOG AAA 764 
Thr Met Leu Leu Ala Ala Leu Ser Gly Glu Tyr Ala Phe Asp Trp Lys 
165 170 175 180 

ACT OOC OGA GCA AAG TCA ACT AIT GCA GCA ATC ATC ATC OOG ADG GTC 812 
70 Ser Gly Arg Ala Lys Ser Thr He Ala Ala He Met He Pro Thr Val 
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185 190 195 

TAT CTS ATT TOG CJTPl GAT TKV GIT GCT GTC GUT CAA GAC TCT TOG TOG 860 
Tyr Leu He Trp Val Asp Tyr Val Ala Val Gly Gin Asp Ser Trp Ser 
I 200 205 210 

ATC AAC GAT GfiG AAG ATT GTIA GOG TOG AGG CUT GGA GCjT CTA CTA OOC 908 
He Asn Asp Glu Lys He Val Gly Tcp Arg Ijeu Gly Gly Val Leu Pro 
215 220 225 

ATT GAG GAA GCT ATO TIC TIC TIA CIG AOS AM* OTV ATG ATT GTT CIG 956 
He Glu Glu Ala Met Phe Phe Leu Leu Thr Aan Leu Met He Val Leu 
230 235 240 

5 GGT CTG TCT GCX: T3C GAT CAT ACT CAG OOC CIA lAC CIG CIA CAC GGT 1004 
Gly Leu Ser Ala Cys Asp His Hir Gin Ala Leu Tyr Leu Leu His Gly 
245 250 255 260 

OGA ACT ATT TAT GGC AAC AAA AftG ATG CXA TCT Tt3l TTT OOC CTC ATT 1052 
0 Arg Thr He Tyr Gly Asn Lys Lys Met Pro Ser Ser Phe Pro Leu He 
265 270 275 

ACA OCXS OCT GTG CTC TOC CIG TIT TIT AGO AGC OGA OCA TAG TCT TCT 1100 
Hir Pro Pro Val Leu Ser Leu Phe Phe Ser Ser Arg Pro Tyr Ser Ser 
5 280 285 290 

CAG OCA AAA CGT GAC TIG C3AA CTG GCA GTIC AAG TIG TIG (SAG AAA AAG 1148 
Gin Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu Leu Glu Lys Lys 
295 300 305 

w 

AGC Oaa AQC TIT TIT GFT C30C TOG OCT GGA TTT CXT AGC GAA GTT AGG 1196 
Ser Arg Ser Ete Phe Val Ala Ser Ala Gly Phe Pro Ser Glu Val Arg 
310 315 320 

\5 

C5AG AGG CIG GIT GGA CIA TAC GCA TTC TSC OGG CTG ACT GAT GAT CTT 1244 
Glu Arg Leu Val Gly Leu Tyr Ala Ete Cys Arg Val Ihr Asp Asp Leu 
325 330 335 340 

M ATC C3AC TCT OCT GAA CTA TCT TOC AAC COG CAT GOC ACA ATT GAC ATG 1292 
He Asp Ser Pro Glu Val Ser Ser Asn Pro His Ala Thr He Asp Met 
345 350 355 

GTC TCC GAT TIT CIT ACC CIA CIA TIT GOG CTC COG CTA CAC CXT TOG 1340 
45 Val Ser Asp Phe Leu Hir Leu Leu Phe Gly Pro Pro Leu His Pro Ser 
360 365 370 

CAA OCT GAC AAG ATC CTT TCT TOG OCT TIA CTT OCT CXT TOG CAC CXT 1388 
Oln Pro Asp Lys He Lea Ser Ser Pro Leu Leu Pro Pro Ser His Pro 
so 375 380 385 

TOC OGA OOC ADS GGA AIGTATOOCCTCOOGCCTCXTCXTTOGCTCTOS 1436 
Ser Arg Pro Thr Gly Met Tyr Pro Leu Pro Pro Pro Pro Ser Leu Ser 
390 395 400 

55 

OCT GOC GftS CIC GTT CAA TIC CTT AOC GAA AGG GIT OOC OTT CAA lAC 1484 
Pro Ala Glu Leu Val Gin Phe Leu Ihr Glu Arg Val Pro Val Gin Tyr 
405 410 415 420 

tio CAT TIC GOC TIC AGG TIG CTC GCT AK3 TIG CAA GOG CTG ATC CXT CXA 1532 
His Phe Ala Phe Arg Leu Leu Ala Lys Leu Gin Gly Leu He Pro Arg 
425 430 435 

TRC CXA CTC GAC GAA CTC CTT AGA GGA TAC ACT ACT GAT CTT ATC TTT 1580 
65 Tyr Pro Leu Asp C3lu Leu Leu Arg Gly Tyr Itu: Ihr Asp Leu He Phe 
440 445 450 

OCX: TEA TOG ACA GAG GCA GTC CAG CSCT OGG AAG AOS CXT ATC GPG AOC 1628 
Pro Leu Ser Ihr Glu Ala Val Gin Ala Arg Lys Ohr Pro He Glu Thr 
TO 455 460 465 
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ACA OCT GAC TIG CTS GAC TAT GGTT CIA TCP GIA GCA OOC TCA CTC GCX: 1676 
Hir Ala Asp l£u Leu Asp Tyr Gly Leu cys Val Ala Gly Ser Val Ala 
470 475 480 

GAG cm TIG GIC TAT C?IC TCT 1GG GCA AGT GCA OCA ACT CAG GIC CCT 1724 
Glu Ifiu Leu Val Tyr Val Ser Tcp Ala Ser Ala Pro Ser Gin Val Pro 
485 490 495 500 

QCC ACC KYPl GAA GAA ASA GAA GCT CHG TIA GIG GCA AQC OGA GAG A3G 1772 
Ala Thr lie Glu Glu Arg Glu Ala Val Leu Val Ala Ser Arg Glu Met 
505 510 515 



GGA ACT GOC CTT GAG TIG GIG AAC ATT GCT AQG GAC ATT AAA GGG GAC 1820 
Gly Hhr Ala Leu Gin Leu Val Asn He Ala Arg Asp He Lys Gly Asp 
" 520 525 530 

GCA ACA GAA GOG AGA TTT TRC CIA OCA CTC TCA TTC TIT GGT CTT 030 1868 
Ala Thr Glu Gly Arg Ete lyr Leu Pro Leu Ser Phe Phe Gly Leu Arg 
535 540 545 

30 

GAT GAA TCA AA6 CTT GOG ATC COG ACT GAT T3G AOG GAA OCT COG OCT 1916 
Asp Glu Ser Lys Leu Ala He Pro Thr Asp Tcp Thr Glu Pro Arg Pro 
550 555 560 

a CAA GAT TTC GAC AAA CTC CTC ACT CIA TCT OCT TOG TOC ACA TIA CCA 1964 
Gin Asp Phe Asp Lys l£u Leu Ser Leu Ser Pro Ser Ser Thr Leu Pro 
565 570 575 580 



30 



TCT TCA AAC GOC TCA GAA AGO TIC OQG TIC GAA T3G AAG AOG TOC TOG 2012 
Ser Ser Asm Ala Ser Glu Ser Phe Arg Pte Glu Trp Lys Thr Tyr Ser 
585 590 595 



err OCA TIA GTC GOC TAC GCA GAG GAT CTT GOC AAA CKT TCT TAT M>C 2060 
35 Leu Pro Leu Vcd Ala Tyr Ala Glu Asp Leu Ala Lys His Ser Tyr Lys 
600 605 610 

GGA ATT GAC OGA CTT OCT AOC GAG CTT C3A GOG OGA A3G OGA GOG GCT 2108 
Gly He Asp Arg Leu Pro Thr Glu Val Gin Ala Gly Met Arg Ala Ala 
40 615 620 625 

TGC GOG AGO TAC CIA CIG ATC GGC OGA GAG ATC AAA CTC GIT TOG AAA 2156 
Cys Ala Ser Tyr Leu Leu He Gly Arg Glu He Lys Val Val Trp Lys 
630 635 640 

4i 

OGA GAC CTC OGA GAG AGA AGG ACA GTT GOC OGA TOG MG AGA GIA COG 2204 
Gly Asp Val Gly Glu Arg Arg Thr Val Ala Gly Trp Arg Arg Val Arg 
645 650 655 660 

50 AAA GIC TIG ACT GIG GIC ATG AGO OGA TGG GAA GGG CAG TAAGACAGOG 2253 
Lys Val Leu Ser Val Val Met Ser Gly Tcp Glu Gly Gin 
665 670 

GAAGAAIADC GACAGACAAT GATGACTGAG AA3AAAATCA TOCICAATCT lX;i ' l ' IL '' li:iA 2313 

GGIGCIUriT TITCimCT ATIATGAOCl ACTCIAAAGG AACIGGCCTT GCAGATAnT 2373 

cicrrooocc aacirocic crrrocATaG TnGircnr ocATrmCT oqgitiacia 2433 

60 TCTCAATTCr TITICTIGCT TlTiX-TIATC AATdMACA ATrCDOAGA TGITIAGAAT 2493 
TIAmsVnG AaM3CTT2VIA GAOCAXAAAG AC17AAAAAA AAAAAAAAAA AAA 2546 



(2) INFCRMAnCN FOR SBQ ID N0:13: 

(i) SBQUEtO: CHARACTEZasnCS : 

(A) LEMGIU: 673 amino acids 
<B) TYPE: amino acid 
(D) TDPaLOGY-: linear 
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(ii) MDIHMJE: TYPE: protein 

(xi) SEQUENCE DESCRIPTICN: SBQ ID ND:13: 

5 Met Thr Ala Leu Ala Tyr Tyr Gin lie His Leu He Tyr Thr Leu Pro 
15 10 15 

He Leu Gly Leu Leu Gly Leu Leu Ihr Ser Pro He Leu Thr Lys Rie 
20 25 30 

Asp He Tyr Lys He Ser He Leu Val Phe He Ala Phe Ser Ala Thr 
35 40 45 

Ihr Pro Trp Asp Ser Trp He He Arg Asn Gly Ala Ttp Thr Tyr Pro 
15 50 55 60 

Ser Ala Glu Ser Gly Gin Gly Val Phe Gly Thr Phe Leu Asp Val Pro 
65 70 75 80 

20 Tyr Glu Glu Tyr Ala Phe Phe Val He Gin Thr Val He Thr Gly Leu 
85 90 95 

Veil Tyr Val Leu Ala Thr Arg His Leu Leu Pro Ser Leu Ala Leu Pro 
100 105 110 

23 

Lys Thr Arg Ser Ser Ala Leu Ser Leu Ala Leu Lys Ala Leu He Pro 
115 120 125 

Leu Pro He He Tyr Leu Phe Thr Ala His Pro Ser Pro Ser Pro Asp 
30 130 135 140 

Pro Leu Val Thr Asp His Tyr Ete Tyr Met Arg Ala Leu Ser Leu Ihu 
145 150 155 160 

j5 He Thr Pro Pro Thr Met Leu Leu Ala Ala Leu Ser Gly Glu Tyr Ala 
165 170 175 

Phe Asp Trp Lys Ser Gly Arg Ala Lys Ser Thr He Ala Ala He Met 
180 185 190 

40 

He Pro Thr Val Tyr Leu He Trp Val Asp Tyr Val Ala Val Gly Gin 
195 200 205 

Asp Ser Trp Ser He Asn Asp Glu Lys He Val Gly Trp Arg Leu Gly 
45 210 215 220 

Gly Val Leu Pro He Glu Glu Ala Met Phe Leu Leu Thr Asn Leu 
225 230 235 240 

50 Met He Val Leu Gly Leu Ser Ala Cys Asp His Thr Gin Ala Leu Tyr 
245 250 255 

Leu Leu His Gly Arg Thr He Tyr Gly Asn Lys Lys Met Pro Ser Ser 
260 265 270 

55 

Ete Pro Leu He Thr Pro Pro Val Leu Ser Leu Phe Phe Ser Ser Arg 
275 280 285 

Pro Tyr Ser Ser Gin Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu 
60 290 295 300 

Leu Glu Lys Lys Ser Arg Ser Phe Phe Val Ala Ser Ala Gly Phe Pro 
305 310 315 320 

65 Ser Glu Val Arg Glu Arg Leu Val Gly Leu Tyr Ala Ph& Cys Arg Val 
325 330 335 

Thr Asp Asp Leu He Asp Ser Pro Glu Val Ser Ser Asn Pro His Ala 
340 345 350 

70 
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Thr He Asp Met Val Ser Asp Phe Leu Thr Leu Leu Phe Gly Pro Pro 
355 360 365 

Lau .His Pro Ser Gin Pro Asp Lys He Leu Ser Ser Pro Leu Leu Pro 
5 370 375 380 

Pro Ser His Pro Ser Arg Pro Thr Gly Met Tyr Pro Leu Pro Pro Pro 
385 390 395 400 

10 Pro Ser Leu Ser Pro Ala Glu Ijeu Val Gin Phe Leu Thr Glu Arg Val 
405 410 415 

Pro Val Gin Tyr His Phe Ala Phe Arg Leu Leu Ala Lys Leu Gin Gly 
420 425 430 

IS 

He Pro Arg Tyr Pro Leu Asp Glu Leu Leu Arg Gly Tyr Thr Thr 
435 440 445 

Asp Leu He Phe Pro Leu Ser Thr Glu Ala Val Gin Ala Arg Lys Thr 
20 450 455 460 

Pro He Glu Thr Thr Ala Asp Leai Leu Asp Tyr Gly Leu Cys Vcd Ala 
465 470 475 480 

25 Gly Sear Val Ala Glu Leu Leu Val Tyr Val Ser Ttp Ala Ser Ala Pro 
485 490 495 

Ser Gin Val Pro Ala Thr He Glu Glu Arg Glu Ala Val Leu Val Ala 
500 505 510 

30 

Ser Arg Glu Met Gly Thr Ala Leu Gin Leu Val Asn He Ala Arg Asp 
515 520 525 

He Lys Gly Asp Ala Thr Glu Gly Arg Phe Tyr Leu Pro Leu Ser Phe 
3S 530 535 540 

Phe Gly Leu Arg Asp Glu Ser Lys Leu Ala He Pro Thr Asp Ttp Thr 
545 550 555 560 

40 Glu Pro Arg Pro Gin Asp ttie Asp Lys Leu Leu Ser Leu Ser Pro Ser 
565 570 575 

Ser Thr Leu Pro Ser Ser Asn Ala Ser Glu Ser Phe Arg Phe Glu Ttp 
580 585 590 

45 

Lys Thr Tyr Ser Leu Pro Leu Val Ala Tyr Ala Glu Asp Leu Ala Lys 
595 600 605 

His Ser Tyr Lys Gly He Asp Arg Leu Pro Thr Glu Val Gin Ala Gly 
50 610 615 620 

Met Arg Ala Ala Cys Ala Ser Tyr Leu Leu He Gly Arg Glu He Lys 
625 630 . 635 640 

55 Val Val Trp Lys Gly Asp Val Gly Glu Arg Arg Thr Val Ala Gly Trp 
645 650 655 

Arg Arg Val Arg Lys Val Lea Ser Val Val Met Ser Gly Trp Glu Gly 
660 665 670 

60 

Gin 



(2) INPORMATICN FOR SBQ ID N0:14: 

65 

(i) SBCfUENCE CHT^RACnPISnCS : 

{A) UENGIH: 1882 base pairs 

(B) TVPE: nucleic acid 

(C) STRANDEaUESS: cbuble 
TO (D) TOPauxsif: linear 
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(ii) IVDLECUI^ TYPE: ciCNR 

(iii) hvpoihehcal: no 

5 (iv) M3n-SENSE: NO 

(vi) ORIGINAL SCURCE: 

(A) CSRGANISM: Fhaffia rhodozyna 

10 (ix) FESmJRE: 

(A) NftME/KE^: CDS 
' (B) LOOOTICN: B2..1212 

(D) OIHER INPORMMTICN: /product^ "PRcrtE" 

15 (xi) SEQUENCE DESCiaPTICN: SBQ ID NO: 14: 

OGCADGAGCC AAmAAfiGT GCACICaGC3C KTAGCTOACA C3\CAGAACIA CS^IMTOVIA 60 

CACrcATOOG GAACRCATO3 G AltS GftT TPC QC3G AAC ATC CTC ACA GCA MT 111 
20 Met Asp Tyr Ala Asn He Leu Thr Ala lie 

15 10 

CCA CIC GW3 TTT ACT CCT CAG GKT GAT ATC GIG CTC dT GAA COG TAT 159 
Pro Leu Glu Phe Thr Pro Gin Asp Asp He Val Leu Leu Glu Piro Tyr 
25 15 20 25 

CAC m: cm gga am; aac cct gga aaa gra att cga tca caa ore atc 207 

His Tyr Leu Gly Lys Asn Pro Gly Lys Glu He Arg Ser Gin Leu He 
30 35 40 

30 

GAG GCT TTC AAC TAT TGG TTS GAT C?IT: AWS AAG GPG GAT CTC GAG GIC 255 
Glu Ala Phe Asn Tyr Trp Leu Asp Val Lys Lys Glu Asp Leu Glu Val 
45 50 55 

33 ATC CAG AAC GIT GIT GGC AIG CIA OCT AOC GCT AGC TIA TTA AIG GAC 303 
He Gin Asn Val Val Gly Met Leu His Thr Ala Ser Leu Leu Met Asp 
60 65 70 

GAT GIG GAG GAT TCA TOG GIC CTC AOG OST GOG TOG OCT GIG GOC CM 351 
40 Asp Val Glu Asp Ser Ser Val Leu Arg Arg Gly Ser Pro Val Ala His 
75 BO 85 90 

CIA ATT T7UZ 03G ATT COS CAG ACA ATA AAC ACT GCA AAC TPC GIC TAC 399 
Leu He Tyr Gly He Pro Gin Thr He Asn Thr Ala Asn Tyr Val Tyr 
45 95 100 105 

TTT CIG GCT TKT CAA Gft3 AIC TTC AAG CTT OX OCA ACA COG ATA OOC 447 
Phe Leu Ala Tyr Gin Glu He ttie Lys Leu Arg Pro Thr Pro He Pro 
110 lis 120 

50 

AIG OCT GIA ATT OCT OCT TCA TCT GCT TOG CIT CAA TCA TOC GIC TOC 495 
Met Pro Val He Pro Pro Ser Ser Ala Ser Leu Gin Ser Ser Val Ser 
125 130 135 

55 TCT OCA TCC TOC TOC TOC TOG GOC TOG TCT GAA AAC GOG GGC AOG TC3V 543 
Ser Ala Ser Ser Ser Ser Ser Ala Ser Ser Glu Asn Gly Gly Thr Ser 
140 145 150 

ACT OCT AAT TOG CAG ATT COG TTC TOG AAA GAT AOG TKT CIT GAT AAA 591 
GO Thr Pro Asn Ser Gin He Pro Phe Ser Lys Asp Thr Tyr Leu Asp Lys 
155 160 165 170 

GIG MC ACA CffiC GAG AIG CIT TOC CTC CAT AGA GOG CAA GGC CIG GfiG 639 
Val He Thr Asp Glu Met Leu Ser Leu His Arg Gly Gin Gly Leu Glu 
65 175 180 185 

CIATTCTGGAGAGATAGrCrGAOGTGraCTAGCGAAGAGGAATATGrG 687 
Leu Phe Trp Arg Asp Ser Leu Thr Cys Pro Ser Glu Glu Glu Tyr Val 
190 195 200 
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AAA ATC CTT CIT OGA AAG ACG GC3A QCJX TTG TIC OCT AlA GOS GIC AGA 735 
Lys Met Val Leu Gly Lys Ihr Gly Gly Leu Phe Arg lie Ala Val Arg 
205 210 215 

5 TTC Arc ATG GCA AAG TCA GAA TGTT GAC KTA GAC TTT GTC CAG CIT CTC 783 
Leu Met Met Ala Lys Ser Glu Cys Asp He Asp Ete Val Gin leu Val 
220 225 230 



10 



AAC TIG ATC TCA ATA TAG TIC CAG ATC AQG GAT GAC TAT AIG AAC CIT 831 
Asn Leu He Ser He Tyr Phe Gin He Arg Asp Asp Tyr Met Asn Leu 
235 240 245 250 

CAG TCr TCr GAG mT GOC CAT AAT AAG AAT TIT OCA GAG GAC CIC ACA 879 
Gin Ser Ser Glu Tyr Ala His Asn Lys Asn Phe Ala Glu Asp Leu Ohr 
a 255 260 265 

GAA GOG AAA TIC ACT TTT COC ACT OTC CAC TOG MT CAT GCC AAC OOC 927 
Glu Gly Lys Phe Ser Phe Pro Ihr He His Ser He His Ala Asn Pro 
270 275 280 

20 

TCA TOG AGA CIC GTIC ATC AAT AOS TTG CAG AAG AAA TC3G AOC 1CT OCT 975 
Ser Ser Arg Leu Val He Asn Hir Leu Gin Lys Lys Ser Thr Ser Pro 
285 2 90 295 

25 GAG ATC CIT CAC CAC TST GTA AAC "TOC AIG OOC ACA GAA ADC CAC TCA 1023 
Glu He Leu His His Cys Val Asn Tyr Met Arg Thr Glu Ihr His Ser 
300 305 310 

TTC GAA TAT ACT CPG GAA GIC CIC AAC ACC TIG TCA GC?r GCA CIC GAG 1071 
yo Phe Glu Tyr Hir Gin Glu Val Leu Asn Thr Leu Ser Gly Ala Leu Glu 
315 320 325 330 

AGA OftA CTA OGA AGG CIT CAA GGA GAG TTC GCA GAA GCT AAC TCA AGG 1119 
Arg Glu L^ Gly Arg Leu dn Gly Glu Phe Ala Glu Ala Asn Ser Arg 
35 335 340 345 



AIG GAT CIT GGA GAC GIA GAT TOS GAA GGA AGA AOG GOG AAG AAC OTC 1167 
Met Asp Leu Gly Asp Val Asp Ser Glu Gly Arg Thr Gly Lys Asn Val 
40 350 355 360 

AAA TIG GAA QCG ATC CIG AAA AAG CIA GOC GAT ATC CCT CTG TGAAAGAACA 1219 
Lys Leu Glu Ala He Leu Lys Lys Leu Ala Asp He Pro Leu 
365 370 375 

45 

TTOTCICrCr CrOGIClCTC OGmCIATC MGGrmAT AAGITGICIC TI ' mriUJrA 1279 

AQGCJITTCTC AGATGATTOG ACITGATGIG ClUlAl'iGOC CGOTCAICrr I ' i ' ilALTlUJ 1339 

50 Acrmncr ciAOCGriGCA tsoqc?otog CAncrcrro TrcKrcrrGr ctteaatitg 1399 

TTOGACATTtfl CAITOATCAT GGICTCTTCT TCnTTOGAA GAAATCTOGT GACnOTIGA 1459 

AdTCAACia TAATITVATCA TTOTCATATC TCAAAGnCIT CCTdTCICS CAAItTTGATT 1519 

55 

ociccnccA CTicocTcrr tsatitocit cicaitgatc ojmmTi ' TcmTriGc 1579 

TCTCCKJicT cTiCTnATT axxnTOosr cicr cTCrLT ajrmi:ii.T ' ii^AcmTiT 1639 

60 TTrrcjvicrr ctcictgica AcnCTCArr taatcicict MaGicrcAT orcAACACGr 1699 

GOCAAQCMG TCAnACGICT GCWaOCSTGAT GTACACTCAT TTIGOCATCC CILTIUAAG 1759 

GCJTCICATCr ATCTIGICIA TCSALTITIC CTCnTTIGA ATTTOCICQG AJiTi'lKlC'I' 1819 

TGGmiAAOC AAT9GAGAAG AGOQCAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAACTGG 1B79 

AGG 1882 
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<2) INFDFMRTICN PGR SBQ ID N0:15: 

(1) SEQUEIKS GHARACIERISnCS : 

(A) LENGTIH: 376 amino acixiis 
5 (B) TYPE: amiiio acid 

(D) TOTQLOGy: linear 

(ii) NDLECCJE£ TlfPE: protein 

10 (xi) SBQCJENCE DESCKCPTICN: SEQ JD ND:15: 

Met Asp Tyr Ala Asn lie l£U Hir Ala He Pro Leu Glu Phe Ihr Pro 
15 10 15 

15 Gin Asp Asp He Val Leu Leu Glu Pro Tyr His Tyr Leu Gly Lys Asn 
20 25 30 

Pro Gly Lys Glu He Arg Ser Gin Leu He Glu Ala Ite Aan Tyr Trp 
35 40 45 

20 

Jjsa Asp Val Lys Lys Glu Asp hsa Glu Val He Gin Asn Val VeQ Gly 
50 55 60 

htet Leu His Hir Ala Ser Ijeu Leu Met Asp Asp Val Glu Asp Ser Ser 
25 65 70 75 80 

Val Leu Arg Arg Gly Ser Pro Val Ala His Leu He Tyr Gly He Pro 
85 90 95 

30 Gin Thr He Asn Tte Ala Asn Tyr Val Tyr Phe l^u Ala Ty^^ Gin Glu 
100 105 110 

He Phe Lys Leu Arg Pro Hir Pro He Pro Met Pro Val He Pro Pro 
115 120 125 

35 

Ser Ser Ala Ser Leu Gin Ser Ser Val Ser Ser Ala Ser Ser Ser Ser 
130 135 140 

40 Ser Ala Ser Ser Glu Asn Gly Gly Uu: Ser Thr Pro Asn Ser Gin He 
145 150 155 160 

Pro Phe Ser Lys Asp Thr Tyr Leu Asp Lys Val He Tte Asp Glu Met 
165 170 175 

45 

teu Ser Leu His Arg Gly Gin Gly Leu Glu Leu Phe Trp Arg Asp Ser 
180 185 190 

Leu Hir Cys Pro Ser Glu Glu Glu Tyr Val Lys Met V^ Leu Gly Lys 
50 195 200 205 

Dir Gly Gly Leu Phe Arg He Ala Val Arg Leu Net Met Ala Lys Ser 
210 215 220 

55 Glu Cys Asp He Asp Ete Val Gin Leu Val Asn Leu He Ser He Tyr 
225 230 235 240 

Pbe Gin He Arg Asp Asp Tyr Met Asn Leu Gin Ser Ser Glu Tyr Ala 
245 250 255 

60 

His Aan Lys Asn Phe Ala Glu Asp Leu Thr Glu Gly Lys Phe Ser Phe 
260 265 270 

Pro Ihr He His Ser He His Ala Asn Pro Ser Ser Arg Leu Val He 
65 275 280 285 

Asn Ihr Leu Gin Lys Lys Ser Thr Ser Pro Glu He Leu His His Cys 
290 295 300 

70 Val Asn Tyr Met Arg Thr Glu Thr His Ser Ite Glu Tyr Thr Gin Glu 
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305 310 315 320 

Val Leu Asn Thr Leu Ser Gly Ala Leu Glu Arg Glu Lai Gly Arg Leu 
325 330 335 

Gin Gly Glu Phe Ala Glu Ala Asn Ser Arg Met Asp Leu Gly Asp Val 
340 345 350 

Asp Ser Glu Gly Arg Thr Gly Lys Asn Val Lys Leu Glu Ala He Leu 
10 355 360 365 

Lys Lys Leu Ala Asp lie Pro Leu 
370 375 



{2) INFORMftTICN FOR SBQ ID N0:16: 

(i) SSQUEI3CE CKARACmuSTICS : 

(A) LSlXTTHi 2056 base pairs 

(B) TYPE: nucleic acid 

(C) SIKANDECNESS: double 

(D) T0PQL0C3Y: linesu: 

(ii) NDLEOILE TVPE: dCNA 

(iii) HVFOIHEnCAL: NO 

(iv) Pi^'SEHJSEt NO 

(vi) ORIGINAL SOURCE: 

(A) CBGANISM: Eteffia rhodozyma 

(ix) FEAllIRE: 

(A) NAME/KEY: CDS 

(B) LXATICN: 46.. 1794 

(D) OIHER mroRMAnCN: /product= "PRcrtI" 



(xi) SBQUENCE DBSGRIPTICN: SBQ ID N0:16: 

cxrroGoasAA irEAAcriGA cacataacic lAGnvrcmr actog atg oca aaa 54 

Met Gly Lys 
1 

GAA CAA GAT CPG GAT AAA CXT ACA GCT ATC ATC GIG GGA T3T GC?r ATC 102 
Glu Gin Asp Gin Asp Lys Pro Thr Ala He He Val Gly Cys Gly He 
5 10 IS 

GST OGA ATC GOC ACT GOC GCT OCT CTT GOT AAA GAA GGT TIC CAG GIC 150 
Gly Gly Xle Ala Thr Ala Ala Arg Leu Ala Lys Glu Gly Phe Gin Val 
20 25 30 35 

AOG CTG TTC GAG AAG AAC GAC TAC TOO OGA GGT OGA TOO TCT TTA ATC 198 
Thr Val Phe Glu Lys Asn Asp Tyr Ser Gly Gly Arg Cys Ser Leu He 
40 45 50 

GAG OGA GAT CUT TAT OGA TTC GAT CM G3G OOC ACT TTC CIG CTC TIG 246 
Glu Arg Asp Gly Tyr Arg Ete Asp Gin Gly Pro Ser Leu Leu Leu Leu 
55 60 65 

CCA GAT CTC TIC AAG CAG ACA TTC GAA GAT TIG GGA GW3 AAG MG GAA 294 
Pro Asp Leu Phe Lys Gin Thr Phe Glu Asp Leu Gly Glu Lys Met Glu 
70 75 80 

GAT TOG CTC GAT CIC ATC AAG TCT GAA CCC AAC TAT GTT TGC CAC TIC 342 
Asp Ttp Val Asp Leu He Lys Cys Glu Pro Asn Tyr Val Cys His Phe 
85 90 95 

CAC GAT GAA GAG ACT TTC ACT TTT TCA ACC GAC ATS GOG TTG CIC AAG 390 
His Asp Glu Glu Thr Phe Thr Phe Ser Ihr Asp Met Ala Leu Leu Lys 
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100 105 110 lis 

033 GAA arc GAG GOT TIT GAA QQC AAA GAT GGA TTT GAT 030 TIC TIG 438 
Arg Glu Val Glu Arg Phe Glu Gly Lys Asp Gly Phe Asp Arg Phe Leu 
5 120 US 130 

TCG TTT ATC CAA GAA GOC AGA CAT TPC GAG CIT GCT CTC GIT CAC 486 
Ser Phe He Gin Glu Ala His Arg His Tyr Glu Leu Ala Val Val His 
135 140 145 

)0 

CTC CIG GAG AAG AAC TTC C3CT QQC TTC OCA OCA TIC TIA OQG CIA CAG 534 
Val Leu Gin Lys Asn Phe Pro Gly Phe Ala Ala Phe Leu Arg Leu Gin 
150 155 160 

(5 TTC ATT QQC CAA ATC CIG GCT CTT CAC COC TIC GAG TCT ATC TOG ACA 582 
Phe lie Gly Gin He Leu Ala Leu His Pro Phe Glu Ser He Trp Thr 
165 170 175 



20 



MA CTT TGTT OGA TOT TTC AM3 ADC GAC AGA TIA OGA MA GIC TIC TOG 630 
Arg Val Cys Arg Tyr Phe Lys Ihr Asp Arg Leu Arg Arg Val Phe Ser 
180 185 190 195 

TIT GCA GIG A3G TPC ATG GGT CAA AQC OCA TAC AGT GOG COC GGA ACA 678 
Phe Ala Val Met Tyr Met Gly Gin Ser Pro Tyr Ser Ala Pro Gly Ihr 
200 205 210 

lAT TOC TIG CIC CAA TAC ADC GAA TIG ACC GAG GGC ATC T3G TOT COG 726 
Tyr Ser Leu Leu Gin Tyr Thr Glu Leu Vnr Glu Gly He Tip Tyr Pro 
215 220 225 

AGA GGA GQC TTT T3G CAG CsIT OCT AAT ACT CTT CTT CAG ATC GIC AAG 774 
Arg Gly Gly Phe Trp Gin Val Pro Asn Thr Leu Leu Gin He Val Lys 
230 235 240 

a3Z AAC AAT COC TC3V GOC AAG TIC AAT TIC AAC GCT OCA GIT TOO CAG 822 
Arg Asn Asn Pro Ser Ala Lys Phe Asn Phe Asn Ala Pro Val Ser Gin 
245 250 255 

CTT CIT CTC TOT OCT GOC AAG GAC OGA GOG ACT GGT GTT OGA CTT GAA 870 
Val Leu Leu Ser Pro Ala Lys Asp Arg Ala Thr Gly Val Arg Leu Glu 
260 265 270 275 

TOC GOC GAG GAA CAT CAC GOC GAT GIT GIG ATT GTC AAT GCT GAC CIC 918 
Ser Gly Glu Glu His His Ala Asp Val Val He Val Asn Ala Asp Leu 
280 285 290 

GIT TPC GOC TOO GAG CAC TIG ATT OCT GAC GAT GOC AGA AAC AAG ATT 966 
Val Tyr Ala Ser Glu His Leu He Pro Asp Asp Ala Arg Asn Lys He 
295 300 305 

QQC CAA CTG GGT GAA GIC AAG MA PCT TGG TOG GOT GAC TIA GIT GOT 1014 
Gly Gin Leu Gly Glu Val Lys Arg Ser Trp Trp Ala Asp Leu Val Gly 
310 315 320 

53 GGA AAG AAG CTC AAG GGA AGT TGC AGT ACT TTG AGO TTC TAC TOG AGO 1062 
Gly Lys Lys Leu Lys Gly Ser Cys Ser Ser Leu Ser Phe Tyr xrp Ser 
325 330 335 

60 ATG GAC OGA POQ GIG GAC GGT CPS GQC GGA CAC AAT ATC TIC TTG GOC 1110 
Met Asp Arg He Val Asp Gly Leu Gly Gly His Asn He Phe Leu Ala 
340 345 350 355 

GAG GAC rrC AAG GGA TCA TIC GAC ACA ATC TIC GAG GAG TTG GOT CIC 1158 
65 Glu Asp Ete Lys Gly Ser Phe Asp Thr He Phe Glu Glu Leu Gly Leu 
360 365 370 

CCA GOC GAT OCT TOO TTT TAC GTG AAC GIT OOC TOG OGA ATC GAT CCT 1206 
Pro Ala Asp Pro Ser Phe Tyr Val Asn Val Pro Ser Arg He Asp Pro 
70 375 380 385 



50 
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TCTGCXTGCrarGAAGCriUAGATGCrATCGICArrOT 1254 
Ser Ala Ala Pro Glu Gly Lys Asp Ala He Val He Leu Val Pro Cys 
390 395 400 

5 GGC CAT Arc GAC OCT TOG AAC OCT CPA GAT TAC AAC AfiG CTT GIT OCT 1302 
Gly His He Asp Ala Ser Asn Pro Gin Asp Tyr Asn Lys Leu Val Ala 
405 410 415 

OQG GGA AGG AAG TIT GTO ATC CAA AOS CTT TCC GCC AAG CTT GGA dT 1350 
10 Arg Ala Arg Lys Phe Val He Gin Ttir Leu Ser Ala Lys Leu Gly Leu 
420 425 430 435 

CCC GAC TIT GAA AAA ATG AIT GTO GCA GAG AAG GIT CAC GAT OCT OX 1398 
Pro Asp Phe Glu Lys Met He Val Ala Glu Lys Val His Asp Ala Pro 
IS 440 445 450 

TCr TGG GAG AAA GAA TIT AAC CTC AAG GAC GGA AGC ATC TIG GGA CIG 1446 
Ser Trp Glu Lys Glu Phe Asn Leu Lys Asp Gly Ser He Leu Gly Leu 
455 460 465 

20 

GCT C3\C AAC TIT ATG CAA GIT CTT GC?r TIC MG OOG AGC AOC AGA CAT 1494 
Ala His Asn Phe Met Gin Val Leu Gly Ete Arg Pro Ser Thr Arg His 
470 475 480 

23 OOC AAG TOT GAC AAG TIG TIC TIT GTC GGG GCT TOG ACT CAT CCC GGA 1542 
Pro Lys Tyr Asp Lys Leu Phe Phe Val Gly Ala Ser Ihr His Pro Gly 
485 490 495 

ACT GOG GIT OCC ATC GTC TIG GCT OGA GOC AAG TEA ACT OOC AAC CAA 1590 
M Ihr Gly Val Pro He Val Leu Ala Gly Ala Lys Leu Ihr Ala Asn Gin 
500 505 510 515 

GTIT CTC GAA TCC TTT GAC GGA TCC CCA GCT CCA GAT OCC AAT ATC TCA 1638 
Val Leu Glu Ser Phe Asp Arg Ser Pro Ala Pro Asp Pro Asn Met Ser 
33 520 525 530 

CTC TCC GIA OCA lAT GGA AAA OCT CTC AAA TCA AAT GGA AOS GGfT ATC 1686 
Leu Ser Val Pro Tyr Gly Lys Pro Leu Lys Ser Asn Gly Thr Gly He 
535 540 545 

40 

GAT TCT CAG GTC CAG CTG AAG TIC ATG GAT TIG GAG MA TOG CTA TAC 1734 
Asp Ser Gin Val Gin Leu Lys Phe Met Asp Leu Glu Arg Trp Val Tyr 
550 555 560 

« err TIG GIG TIG TIG ATT GGG OOC GIG ATC GCT OGA TCC GIT OCT GIT 1782 
Leu Leu Val Leu Leu He Gly Ala Val He Ala Arg Ser Val Gly Val 
565 570 ' 575 

CTT OCT TIC TGAAQCAMA GAACGATOGT TTCTIAGAGr TrmTTAGT 1831 
30 Leu Ala Phe 
580 

cicrrocTGr gitctctcia TOracAiAcr ciGcraGrrcr cjnuivnur aaAGOGncc i89i 
55 Tcrmcirr gigio^gagt CMAoaaaGr cictctcaac cjiuurnuA GOGcracACA 1951 

AnuriAWlC TOGAAAICIC CATCAOCTCA ACTCTGAIGT I t ^CA lLTr ' I ' l ' i ' i ' Al ' XUJi ' 2011 
60 TGCAADVIAC ATGACIGITA TaGAOOQAAA AAAAAAAAAA AAAAAAA 2058 

(2) INFOWATICN FOR SBQ ID N0:17: 

65 (i) SEXJJB^JCE CHARACIEiaSTICS : 

(A) mi3IH: 582 amino acids 

(B) TlfPE: amino acid 
(D) TOBOLLG^: linear 

(ii) M3iLSCtJL£ TYPE: protein 
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(xi) SEX3UENCE DESOlIPnCN: SBQ ID N0:17: 

Met Gly Lys Glu Gin Asp Gin Asp Lys Pro Itir Ala lie He Val Gly 
15 10 15 

5 

Cys Gly He Gly Gly He Ala Thr Ala Ala Arg Leu Ala Lys Glu Gly 
20 25 30 

Phe Gin Val Tta Val Phe Glu Lys Asn Asp Tyr Ser Gly Gly Arg Cys 
10 35 40 45 

Ser Leu He Glu Arg Asp Gly Tyr Arg Phe Asp Gin Gly Pro Ser Leu 
50 55 60 

15 Leu Leu Leu Pro Asp Leu Phe Lys Gin Ihr Phe Glu A^ Leu Gly Glu 
65 70 75 80 

Lys Met Glu Asp Trp Val Asp Leu He Lys Cys Glu Pro Asn Tyr Val 
85 90 95 

20 

Cys His Phe His Asp Glu Glu Thr Phe Ihr Ete Ser Thr Asp Met Ala 
100 105 110 

Leu Leu Lys Arg Glu Val Glu Arg Phe Glu Gly Lys Asp Gly Phe Asp 
25 115 120 125 

Arg Phe Leu Ser Phe He Gin Glu Ala His Arg His Tyr Glu Leu Ala 
130 135 140 

JO Val Val His Val Leu Gin Lys Asn Phe Pro Gly E*ie Ala Ala Phe Leu 
145 150 155 160 

Arg Leu Gin Phe He Gly Gin He Leu Ala Leu His Pro Phe Glu Ser 
165 170 175 

He Trp Thr Arg Val Cys Arg Tyr Phe Lys Thr Asp Arg Leu Arg Arg 
180 185 190 

Val Phe Ser Ite Ala Val Met Tyr Met Gly Gin Ser Pro Tyr Ser Ala 
40 195 200 205 

Pro Gly Ihr Tyr Ser Leu Leu Gin Tyr Thr Glu Leu Thr Glu Gly He 
210 215 220 

45 Trp Tyr Pro Arg Gly Gly Phe Trp Gin Val Pro Asn Ihr Leu Leu Gin 
225 230 235 240 

He Val Lys Arg Asn Asn Pro Ser Ala Lys Phe Asn Phe Asn Ala Pro 
245 250 255 

50 

Val Ser Gin Val Leu Leu Ser Pro Ala Lys Asp Arg Ala Thr Gly Val 
260 265 270 

Arg Leu Glu Ser Gly Glu Glu His His Ala Asp Val Val He Val Asn • 
55 275 280 285 

Ala Asp Leu Val Tyr Ala Ser Glu His Leu He Pro Asp Asp Ala Arg 
290 295 300 

60 Asn Lys He Gly Gin Leu Gly Glu Val Lys Arg Ser Trp Trp Ala Asp 
305 310 315 320 

Leu Val Gly Gly Lys Lys Leu Lys Gly Ser Cys Ser Ser Leu Ser Ete 
325 330 335 

65 

Tyr Trp Ser Met Asp Arg He Val Asp Gly Leu Gly Gly His Asn He 
340 345 350 

PhB Leu Ala Glu Asp Phe Lys Gly Ser Phe Asp Thr He Ete Glu Glu 
TO 355 360 365 
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Leu Gly Leu Pro Ala Asp Pro Ser Phe Tyr Val Asn Val Pro Ser Arn 
370 375 380 

lie Asp Pro Ser Ala Ala Pro Glu Gly Lys Asp Ala He Val He Leu 
5 385 390 395 400 

Val Pro Cys Gly His He Asp Ala Ser Asn Pro Gin Asp T/r Asn Lys 
405 410 415 

10 Leu Val Ala Arg Ala Arg Lys Phe Val He Gin thr Leu Ser Ala Lys 
420 425 430 

Leu Gly Leu Pro Asp Phe Glu Lys Met He Val Ala Glu Lys Val His 
435 440 445 

M 

Asp Ala Pro Ser Trp Glu Lys Glu Phe Asn Leu Lys Asp Gly Ser He 
450 455 460 

Leu Gly Leu Ala His Asn Phe Met Gin Val Leu Gly Phe Arg Pro Ser 
30 465 470 475 480 

Thr Arg His Pro Lys Tyr Asp Lys Leu Phe Phe Val Gly Ala Ser Ihr 
485 490 495 

25 His Pro Gly Thr Gly Val Pro He Val Leu Ala Gly Ala Lys Leu Thr 
500 505 5X0 

Ala Asn Gin Val Leu Glu Ser Phe Asp Arg Ser Pro Ala Pro Asp Pro 
515 520 525 

30 

Asn Met Ser Leu Ser Val Pro Tyr Gly Lys Pro Leu Lys Ser Asn Gly 
530 535 540 

Ihr Gly He Asp Ser Gin Val Gin Leu Lys Phe Met Asp lieu Glu Arg 
35 545 550 555 560 

Trp Val Tyr Leu Leu Val Leu Leu He Gly Ala Val He Ala Arg Ser 
565 570 575 

40 Val Gly Val Leu Ala Phe 
580 



45 (2) INFCRMATICN FOR SBQ XD N0:18: 

(i) SBQUEUCE CHARACIEKISTICS : 

(A) l^GIH: 2470 base pairs 

(B) T^tPE: nucleic acid 

50 (C) STRANDELKBSS : dcuble 

(D) TQPGtDOGV: linear 

{ii) t^CTgCOUB TYPE: dCtUi 

55 (iii) ffifPOnHEmcaL: NO 

(iv) MTTI-SENSE: NO 

(vi) ORIGWKL source:: 
60 (A) ORGANISE: Phaffia rhcsdozynB 

(ix) FEMURE: 

<A) M»1E/KEY: 036 
(B) LOCATICN; 177.. 2198 
fiJ (D) OTHER INEDRMATICN: /product^ "PRcrtY" 

(xi) SBQUE29CE DESCRIPTICK: SEQ ID NO: 18: 

AACAAGAfiGT 0GAC3VC»GfiG AGRTCmGC TQAWSftCTIG TATTOCfiGAA AOGGAAAACA 60 
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AAGCSftAfiGAA QOXXXSAflGC ACATCACX3WV CITCRQCAAG Ca3CfICCM3C CCXSATCIOTG 120 

ATT^SACATCA TCTITUCOCRA CTCUDVrCOT CCCCftACWSR. TOGfiGITnT GTOCSCA 176 

ATG AOG OCT CTC GCA XAT TAC CSG ATC CAT CIG ATC TOT ACT CTC CCA 224 
^tet Thr Ala Leu Ala T/r Tyr Gin He His Leu He Tyr Thr Leu Pro 
15 10 15 

ATT err GGT err crc ocsc era ere act toc cos att tig aca aaa irr 272 

He Leu Gly Leu Leu Gly Leu Leu Thr Sex Pro He Leu Itir Lys Phe 
20 25 30 

GAC ATC TAC AAA A3A TTO ATC CTC GIA TTT AIT GCG TTF ACT GCA AOC 320 
A^ He Tyr Lys He Ser He Leu Val Phe He Ala Phe Ser Ala Ohr 
35 40 45 

ACA CCA TOG GAC TC3V TOG ATC ATC AGA AAT GGC OCA T3G ACA TAT CCA 368 
Thr Pro Trp Asp Ser Trp He He Arg Asn Gly Ala Tcp Hir Tyr Pro 
50 55 60 

TCA GOG GAG ACT GOC CMi GGC GIG TIT GGA ACX3 TIT CIA GAT GTT OCA 416 
Ser Ala Glu Ser Gly Gin Gly Vsd Phe Gly Ihr Phe Leu Asp Val Pro 
65 70 75 80 

TAT GAA GAG TAC GCT TIC TIT GTC ATT CAA AOC GIA ATC AOC GOC TIG 464 
Tyr Glu Glu Tyr Ala Phe Phe Val He Gin Thr Val He Ihr Gly Leu 
85 90 95 

GTC TAC GTC TTG GCA ACT MG CAC CTT CTC CCA TOT CTC GOG CTT OCC 512 
Val Tyr Val Leu Ala Thr Arg His Leu Leu Pro Ser Leu Ala Ijeu Pro 
100 105 110 

AAG ACT AGA TOG TOC GOC CIT TOT CTC GOG CTC AAG GOG CTC ATC OCT 560 
Lys Thr Arg Ser Ser Ala Leu Ser Leu Ala Leu Lys Ala Leu He Pro 
115 120 125 

eiG OOC ATT ATC TAC CIA TTT AOC OCT CAC OOC AGO OCA TOG OOC GAC 608 
Leu Pro He He Tyr Leu Phe Thr Ala His Pro Ser Pro Ser Pro Asp 
130 135 140 

OOG CIC CTS ACA GAT CAC TAC TTC TPC ATG 03G GCA CTC TOC TIA CTC 656 
Pro Leu Val Thr Asp His Tyr Phe Tyr Mat Arg Ala Lsu Ser tea Leu 
145 150 155 160 

ATC AOC CCA Oer AOC ATG CTC TTG GCA OCA TIA TCA GGC GAA TAT GCT 704 
He Thr Pro Pro Thr Met Leu Leu Ala Ala Leu Ser Gly Glu Tyr Ala 
165 170 175 

TTC GAT TGG AAA PCT GGC OGA GCA AAG TCA ACT ATT GCA OCA ATC ATO 752 
Fhe Asp Trp Lys Ser Gly Arg Ala Lys Ser Thr He Ala Ala He Met 
180 185 190 

ATC OOG AOG GIG TAT CIG ATT TOG GIA GKT TAT GIT GCT GTC GCT CAA 800 
He Pro Thr Val Tyr Leu He Trp Val Asp Tyr Val Ala Vad Gly Gin 
195 200 205 

GAC TCr TGG TOG ATC AAC GAT GAG ATT GEA GGG TGG AOG CIT GGA 848 
Asp Ser Trp Ser He Asn Asp Glu Lys He Val Gly Trp Arg Leu Gly 
210 215 220 

GGT GIA CTA OOC AIT GAG GAA GCT ATG TTC TTC TEA C!IG AOG AAT CTA 896 
Gly Val Leu Pro He Glu Glu Ala Met Phe Ete Leu Leu Thr Asn Leu 
225 230 235 240 

AIG ATT GIT CTG GCT CIG TCT GOC TOC GAT CAT ACT CAG GOC CTA TAC 944 
Met He Val Leu Gly Leu Ser Ala Cys Asp His Thr Gin Ala Leu Tyr 
245 250 255 

CPS cm CAC GCT OGA ACT ATT TAT GGC AAC AAA MC ATG CCA TOT TCA 992 
Leu Leu His Gly Arg Thr He Tyr Gly Asn Lys Lys Met Pro Ser Ser 
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260 265 270 

TIT 0C3C CTC ATT ACA CaSCCTCTSCICTCrCTCTITTrrAGCAGCaS^ 
Phe Pro I^u lie Ihr Pro Pro Val Lesu Ser Leu Phe Phe Ser Ser Arg 
275 280 285 



1040 



CCA TAC TCr TCT CAG GOV AAA CX?r GAC TTS GAA CTC GCA GTC AAG TTG 1088 
Pro Tyr Ser Ser Gin Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu 
290 295 300 

1(1 

TIG GAG AAA AAG AGO 033 AGC TTT TIT GIT GC3C TOS OCT GGA TIT OCT 1136 
I^u Glu Lys Lys Ser Arg Ser Phe Phe Val Ala Ser Ala Gly Phe Pro 
305 310 315 320 

15 AGC GAA GIT AGG GAG AGG CIG GIT GGA CIA TMl GCA TIC JQC 03G GIG 1184 
Ser Glu Val Arg Glu Arg Leu Val Gly Leu Tyr Ala Phe Cys Arg Val 
325 330 335 



ACT GAT GAT CIT ATC GAC TCT OCT GAA GIA TCT TCT AAC OOG CAT GOC 1232 
Ihr Asp Asp Leu lie Asp Ser Pro Glu Val Ser Ser Asn Pro His Ala 
340 345 350 

ACA ATT GAC ATC GTC TOC GAT TIT CIT ACC CIA CIA TIT GOG OCC OOG 1280 
Thr lie Asp Met Val Ser Asp Phe Leu Thr leu Leu Phe Gly Pro Pro 
355 360 365 

cm CAC OCT TOG CAA OCT GAC AAG ATC CIT TOT TOG OCT TIA CIT OCT 1328 
Leu His Pro Ser Gin Pro Asp Lys He Leu Ser Ser Pro Leu Leu Pro 
370 375 380 

OCT TOG CAC OCT TOC OGA OOC AOG GGA ATG TAT OOC CIC OOG OOP OCT 1376 
Pro Ser His Pro Ser Arg Pro Thr Gly Met lyr Pro Leu Pro Pro Pro 
385 390 395 400 

OCT TOG CTC TOG OCT OOC GAG CTC GIT CAA TIC CIT AOC GAA MG GTT 1424 
Pro Ser Leu Ser Pro Ala Glu Leu Val Gin Phe Leu Thr Glu Arg Val 
405 410 415 

OOC GTT CAA TAC CAT TIC GOC TTC AGG TTG CTC GCT AAG TTS CAA GGG 1472 
Pro Val Gin Tyr His ftie Ala Phe Arg Leu Leu Ala Lys Leu Gin Gly 
420 425 430 

CIG ATC OCT OGA TAC OCA CTC GAC GAA CTC CTT AGA GGA TAC AOC ACT 1520 
Leu He Pro Arg Tyr Pro Leu Asp Glu Leu Leu Arg Gly Tyr Thr Thr 
435 440 445 

GAT Cn ATC TTP OOC TIA TOG ACA GAG OCA GTC CAG OCT OGG AAG AOG 1568 
Asp Leu lie Phe Pro Leu Ser Thr Glu Ala Val Gin Ala Arg Lys Thr 
450 455 460 

OCT ATC GAG AOC ACA GCT GAC TTG CIG GAC TKT GGT CTA TCT GIA OCA 1616 
Pro He Glu Thr Thr Ala Asp Leu Leu Asp Tyr Gly Leu Cys Val Ala 
465 470 475 480 

GOC TCA GTC GOC GAG CIA TTG GIC TAT GtC TCT TGG GCA AGT OCA OCA 1664 
Gly Ser Val Ala Glu Leu Leu Val Tyr Val Ser Trp Ala Ser Ala Pro 
485 490 495 

ACT CAG GTC OCT GOC AOC ATA GAA GAA «3A GAA GCT GIG TIA GXG GCA 1712 
Ser Gin Val Pro Ala Thr He Glu Glu Arg Glu Ala Val Leu Val Ala 
500 505 510 

AGC OGA GfiG ATG GGA ACT GCX: CTP CAG TIG GIG AAC ATT GCT MG GAC 1760 
Ser Arg Glu Met Gly Thr Ala Leu Gin Leu Val Asn He Ala Arg Asp 
515 520 525 

ATT AAA GOG GAC GC3^ ACA GAA GGG AGA TTT T7\C CIA OCA CIC TCA TTC 1B08 
He Lys Gly Asp Ala Thr Glu Gly Arg Ete Tyr Leu Pro Leu Ser Ete 
530 535 540 
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TTT GGT err OGG GAT GAA TCA AfiG CFT GOG ATC CXX3 ACT GAT TOG ADG 1856 
Phe Gly Leu Arg Asp Glu Ser Lys Leu Ala lie Pro Ttir Asp Trp Thr 
545 550 555 560 

5 GAA (XT CX3G OCT CAA GAT TTC GRC AAA CTC CTC MTT CIA TCT C3CT TOG 1904 
Glu Pro Arg Pro Gin Asp Phe A^ Lys Leu Leu Ser Leu Ser Pro Ser 
565 570 575 



10 



IS 



20 



23 



TCrACATBlCCATCTTC3^AACGCCTCAGAAAGCTICa»TIC^ TGG 1952 
Ser Thr Leu Pro Ser Ser Asn Ala Ser Glu Ser Phe Arg Phe Glu Tqp 
580 585 590 

AAG AOS TPiC TOG CTT OCA TTA CTC GOC TAC GCR GAG GAT CTT GOC AAA 2000 
Lys Ttir Tyr Ser Leu Pro Leu Val Ala Tyr Ala Glu Asp Leu Ala Lys 
595 600 605 

CAT TCT rar AAG GGA ATT GAC OGA CTT OCT AOC GAG GTT CAA GOS GGA 2048 
His Ser Tyr Lys Gly lie Asp Arg Leu Pro Ihr Glu Val Gin Ala Gly 
610 615 620 

ATGCGAGOGGCTlGCGOSAQCTACCmCrGATCGacaGAGAGATC AAA 2096 
Met Arg Ala Ala Cys Ala Ser Tyr Leu Leu He Gly Arg Glu lie Lys 
625 630 635 640 

GIO CTT TOG AAA GGA GAC GTC GGA GAG AGA AGG ACA GTT GOC GGA TC3G 2144 
Val Val Trp Lys Gly Asp Val Gly Glu Arg Arg Thr Val Ala Gly Trp 
645 650 655 

30 AGG AGA Gm OGG AAA GTC TIG ACT CTG CTC ATG AGC GGA T3G GAA GOG 2192 
Arg Arg Val Arg Lys Val Leu Ser Val Val Met Ser Gly Trp Glu Gly 
660 665 670 

CAG lAAGACAGOG GAM3AATA0C GACAGACAAT GATGACTSAG AAIAAAATCA 2245 
35 Gin 

TCCTCAATCT ' lUmCX CEA GLTIUL'lUnT TTTCnTrrcr ATTATSACCA ACTCIAAAGG 2305 
40 AACTGGOCrr QCAGATATIT CTCTTOOOOC GATCITOCrC dTICCTTOG TTCCTTCnT 2365 
OCATrmCT OGCTTTACIA TC?rC3tfOTCT TrnCTTSCT TmCTIATC AATCUiGACA 2425 
ATTCIMAGA 1U1T1W3AAT TDmOAAA AAAAAAAAAA AAAAA 2470 

45 

(2) INPCBMATICN PGR SBQ ID N0:19: 

(i) SEQOENCE CHARACTE^USTICS : 
» (A) I^I?IH: 673 amizio acids 

(B) TiPEi anxlxio acid 
(D) TQPQDDG^: linear 

(ii) VDLEOJLE TYPE: protein 

53 

(xi) SEQUENCE EESCRIPnCK: SBQ ID ND:19: 

Met Ihr Ala Leu Ala Ty^ Tyr Gin lie His Leu lie Tyr Thr Leu Pro 
15 10 15 

60 

lie Leu Gly Leu Leu Gly Ijsa Leu Thr Ser Pro lie Lea Thr Lys E*je 
20 25 30 

Asp lie Tyr Lys lie Ser He Leu Val Phs lie Ala Ete Ser Ala Thr 
6s 35 40 45 

•nir Pro Tcp Asp Ser Trp He He Arg Asn Gly Ala Trp Thr Tyr Pro 
50 55 60 

70 Ser Ala Glu Ser Gly Gin Gly Val Phe Gly Thr Phe Leu Asp Val Pro 
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70 



75 



80 



10 



13 



20 



25 



30 



35 



40 



43 



30 



55 



«0 
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Tyr Glu Glu Tyr Ala Phe Phe Val lie GUsi Ihr Val lie Thr Gly l£u 
85 90 95 

Val Tyr Val Lea Ala Thr Arg His Leu Leu Pro Ser Leu Ala Leu Pro 
100 105 110 

Lys Thr Arg Ser Ser Ala Leu Ser Leu Ala Leu Lys Ala Leu He Pro 
115 120 125 

Leu Pro He He Tyr Leu Phe Thr Ala His Pro Ser Pro Ser Pro Asp 
130 135 140 

Pro Leu Val Thr Asp His Tyr Phe Tyr Met Arg Ala Leu Ser Lai Leu 
145 150 155 160 

He Thr Pro Pro Ihr Met Leu Leu Ala Ala Leu Ser Gly Glu Tyr Ala 
165 170 175 

Phe Asp Trp Lys Ser Gly Arg Ala Lys Ser Thr He Ala Ala He Met 
180 IBS 190 

He Pro Thr Val Tyr Leu He Trp Val Asp Tyr Val Ala Val Gly Gin 
195 200 205 

Asp Ser Trp Ser He Asn Asp Glu Lys He Val Gly Trp Arg Leu Gly 
210 215 220 

Gly Val Leu Pro He Glu Glu Ala Met Phs Phe Leu Leu Ihr Asn Leu 
225 230 235 240 

Met He Val Leu Gly l£u Ser Ala cys Asp His Thr Gin Ala Leu lyr 
245 250 255 

Leu Leu His Gly Arg Thr He Tyr Gly Asn Lys Lys Met Pro Ser Ser 
260 265 270 

Phe Pro Leu He Thr Pro Pro Val Leu Ser Leu Phe Phe Ser Ser Arg 
275 280 285 

Pro Tyr Ser Ser Gin Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu 
290 295 300 

Leu Glu Lys Lys Ser Arg Ser Phe Phe Val Ala Ser Ala Gly Phe Pro 
305 310 315 320 

Ser Glu Val Arg Glu Arg Leu Val Gly Leu Tyr Ala Ehe Cys Arg Val 
325 330 335 

Thr Asp Asp Leu He Asp Ser Pro Glu Val Ser Ser Asn Pro His Ala 
340 345 350 

Thr He Asp Met Val Ser Asp Phe Leu Thr Leu Leu Phe Gly Pro Pro 
355 360 365 

Leu His Pro Ser Gin Pro Asp Lys He Leu Ser Ser Pro Leu Leu Pro 
370 375 380 

Pro Ser His Pro Ser Arg Pro Thr Gly Met Tyr Pro Leu Pro Pro Pro 
385 390 395 400 

Pro Ser Leu Ser Pro Ala Glu Leu Val Gin Phe Leu Thr Glu Arg Vcd 
405 410 415 

Pro Val Gin Tyr His Phe Ala Phe Arg Ijeu Leu Ala Lys Leu Gin Gly 
420 425 430 

Leu He Pro Arg Tyr Pro Leu Asp Glu Leu Leu Arg Gly Tyr Thr Thr 



435 



440 



445 
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Asp Leu He Phe Pro Leu Ser Tftir Glu Ala Val Gin Ala Arg Lys Thr 
450 455 460 

Pro He Glu Thr Tte Ala Asp Leu Leu Asp Tyr Gly Leu Cys Val Ala 
5 465 470 475 460 

Gly Ser Val Ala Glu Leu Leu Val Tyr Val Ser Trp Ala Ser Ala Pro 
485 490 495 

10 Ser Gin Val Pro Ala Thr He Glu Glu Arg Glu Ala Val Leu Val Ala 
500 505 510 

Ser Arg Glu Met Gly Thr Ala Leu Gin Leu Val Asn He Ala Arg Asp 
515 520 525 

15 

He Lys Gly Asp Ala Thr Glu Gly Arg Phe Tyr Leu Pro Leu Ser Phe 
530 535 540 

Phe Gly Leu Arg Asp Glu Ser Lys Leu Ala He Pro Thr Asp Trp Thr 
20 545 550 555 560 

Glu Pro Arg Pro Gin Asp Phe Asp Lys Leu Leu Ser Leu Ser Pro Ser 
565 570 575 

25 Ser Thr Leu Pro Ser Ser Asn Ala Ser Glu Ser Phe Arg Phe Glu Trp 
580 585 590 

Lys Thr Tyr Ser Leu Pro Leu Val Ala Tyr Ala Glu Asp Leu Ala Lys 
595 600 605 

His Ser Tyr Lys Gly He Asp Arg Leu Pro Ihr Glu Val Gin Ala Gly 
610 615 620 

Met Arg Ala Ala Cys Ala Ser Tyr Leu Leu He Gly Arg Glu He Lys 
J5 625 630 635 640 

Val Val Trp Lys Gly Asp Val Gly Glu Arg Arg Thr Val Ala Gly Trp 
645 650 655 

40 Arg Arg Val Arg Lys Val Leu Ser Val Val Met Ser Gly Tcp Glu Gly 
660 665 670 

Gin 



(2) INPORMKnCN PGR SEQ ID ND:20: 

(1) SBQCJEMCB CHARACIERISTICS : 

<A) LEXXTIH: 1165 base pairs 
(&} TYPG: nucleic sicid 

(C) STRANCETNESS: double 

(D) TOPOUXS^i linear 

(ii) HXEOJLE T£PEi cENA 

(iii) HyPaiUhrX'lCAL: NO 

(iv) iWn-SENSE: ND 

(vi) GRIGIKAL source;: 

(A) CRGANISM: Phaffia rhocbzyma 

(Ix) FEATURE: 

(A) NT^/KEY: CDS 

(B) lOCATTGN: 141.. 896 

(D) CnHER INPDRNBVnCN: /product^ "PRidi" 
(xi) SBQOEM:^ DCSCRIPnCN: SBQ ID N0:20: 
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crrcicnTc ctosaccict iraocAGacx: cjitsaagrct cGrmciCR mxxxMvr 60 

CrCXXMTCEA TCACmCCr CVntJCAGAA CMGnCIGA GICAACOCSAA AAGAAAGAflG 120 

5 GCRGAfiGAAA TAIATICmS MG TOC ATG OOC AAC ATT GIT CTC (XC GOC 170 

Met Ser Met Pro Asn He Val Pro Pro Ala 
15 10 

GAG CTC CX31 A3C GAA QGA CTC ACT TIA GAA GAG TfC GAT GAG GAG CM 218 
10 Glu Val Argr Ihr Glu Gly Leu Ser Leu Glu Glu Tyr Asp Glu Glu Gin 
15 20 25 

GTC AGG CIG ATC GAG GW3 OGA TCT ATT CTT GTT AAC CXI3 GAC GAT GIG 266 
Val Arg Leu Met Glu Glu Arg Cys He Leu Val Asn Pro Asp Asp Val 
15 30 35 40 

GOC TAT GGA GAG GCT TOG AAA AAG ADC TOC CMZ TIG ATG TOC AAC ATC 314 
Ala Tyr Gly Glu Ala Ser Lys Lys Thr Cys His Leu Met Ser Asn lie 
45 50 55 

20 

AAC GOG COC AAG GAC CTC CTC CAC OGA OCA TIC TOC GIG TIT CTC TIC 362 
Asn Ala Pro Lys Asp Leu Leu His Arg Ala Phe Ser Val Ete Leu Phs 
60 65 70 

23 OGC OCA TOG GAC GGA GC31 CTC CIG CTT GAG OGA ASA GOG GAC GAG AAG 410 
Arg Pro Ser Asp Gly Ala Leu Leu Leu Gin Arg Arg Ala Asp Glu Lys 
75 80 85 90 

ATT ACG TIC OCT GGA ATS T3G ADC AAC AOS TGT TGC AGT CAT OCT TIG 458 
30 He Ttir Phe Pro Gly Met Tcp .lhr Asn Thr Cys Cys Ser His Pro Leu 
95 100 105 

AGC ATC A;U3 GOC GAG GIT GAA GAG GW3 AAC CAG ATC GCJT GIT OGA OGA 506 
Ser He Lys Gly Glu Val Glu Glu Glu Asn Gin He Gly Val Arg Arg 
w 110 115 120 

OCT GOS TOC OSA AAG TIG GAG OWC GAG CTT OGC GIG OCT ACA TOG T03 554 
Ala Ala Ser Arg Lys Leu Glu His Glu Leu Gly Veil Pro Hir Ser Ser 
125 130 135 

40 

ACT COG COC GAC TOG TIC ADC TAC CIC ACT AGG AIA GAT TAC CIC OCT 602 
thr Pro Pro Asp Ser Ite Thr Tyr Leu Thr Arg He His Ty^^ Leu Ala 
140 145 150 

4% COG AGT GAC GGA CIC TC3G GGA GAA CAC GAG ATC GAC TAC ATT CIC TIC 650 
Pro Ser Asp Gly Leu Tcp Gly Glu His Glu He Asp Tyr He Leu Phe 
155 160 165 170 



TCA ACC ACA OCT AOV GAA CAC ACT GGA AAC OCT AAC GAA GTC TCT GAC 
50 Ser Thr Thr Pro Thr Glu His Thr Gly Asn Pro Asn Glu Val Ser Asp 
175 180 185 



698 



ACT OGA TAT GIC ADC AW3 COC GAG CIC CW3 GOG ATG TTT GfG GAC GAS 746 
Thr Arg Tyr Val Thr Lys Pro Glu Leu Gin Ala Met Phe Glu Asp Glu 
190 195 200 

TOT AAC TCA TIT ADC OCT TOG TIC AAA TIG ATT GOC OGA GAC TTC CTG 794 
Ser Asn Ser Phe Thr Pro Trp Phe Lys Leu He Ala Arg Asp Phe Leu 
205 210 215 

TTT GOC TOG T3G GAT CAA CTT CIC GOC AGA OGA AAT GAA AAG GGT GAG 842 
Phe Gly Trp Trp Asp Gin Leu Leu Ala Arg Arg Asn Glu Lys Gly Glu 
220 225 230 

GIC GAT GOC AAA TOG TTG GAG GAT CIC TOG GAC AAC AAA GIC T3G A«3 890 
Val Asp Ala Lys Ser Leu Glu Asp Leu Ser Asp Asn Lys Val Trp Lys 
235 240 245 250 

ATG TAGICGACOC TTCITICTGr ACAGICATCT CAGTrOGOCT GnQGITGCT 943 
Met 
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TacnCTTOC TcrrcrrxCT Mnvmicnr titcttooct aac?mGAcrr GAicmciA 1003 

C3^37VQC3aRC GC3V370raC ATT^AACTC'IA TTICnGITC TTIATCICTC TTUnMOQA 1063 
5 ATCTTCMGA IHtfOTTCTT TnaOGCrAC AACATTICAG ATCAMATIG CnTTCaGAC 1123 
TACAAAAAAA AAAAAAAAAA ACICGAGQQG aGG!CXXX3C?IA OC U€5 

10 (2) INPOFMATICN FDR SBQ ID N0:21: 

(i) SEQOEICX: CKARACITSlISnCS : 

(A) I£2I?IH: 251 amino acids 

(B) TYPE: amino acid 
15 (D) IDPODCXSV: linear 

(ii) MOLBCULE TYPE: protein 

(xi) SBQOENCE DESOaPTrCN: SBQ ID NO:21: 

20 

Met Ser Met Pro Asn lie Val Pro Pro Ala Glu Val Arg Thr Glu Gly 
15 10 15 

Leu Ser Leu Glu Glu Tyr Asp Glu Glu Gin Val Arg Leu Met Glu Glu 
25 20 25 30 

Arg Cys He Leu Val Asn Pro Asp Asp Val Ala Tyr Gly Glu Ala Ser 
35 40 45 

30 Lys Lys Ihr Cys His Leu Met Ser Asn He Asn Ala Pro Lys Asp Leu 
50 55 60 

Leu His Arg Ala Pha Ser Val Ptje Leu Phe Azg Pro Ser Asp Gly Ala 
65 70 75 80 

35 

Leu Leu Leu Gin Arg Arg Ala Asp Glu Lys He Ihr Phe Pro Gly Met 
85 90 95 

Trp Thr Asn Ihr Cys Cys Ser His Pro Leu Ser He Lys Gly Glu Val 
40 100 105 110 

Glu Glu Glu Asn Gin He Gly Val Arg Arg Ala Ala Ser Arg Lys Leu 
115 120 125 

45 Glu His Glu Leu Gly Val Pro Thr Ser Ser Thr Pro Pro Asp Ser Phe 
130 135 140 

Thr Tyr Leu Thr Arg He His Tyr Leu Ala Pro Ser Asfp Gly Leu Tnp 
145 150 155 160 

50 

Gly Glu His Glu He Asp Tyr He Leu Phe Ser Thr Thr Pro Thr Glu 
165 170 175 

His Thr Gly Asn Pro Asn Glu Val Ser Asp Thr Arg Tyr Val Thr Lys 
55 180 185 190 

£>ro Glu Leu Gin Ala Met Phe Glu Asp Glu Ser Asn Ser Phe Thr Pro 
195 200 205 

(0 Trp ttie Lys Leu He Ala Arg Asp Phe Leu Phe Gly Trp Trp Asp Gin 
210 215 220 

Leu Leu Ala Arg Arg Asn Glu Lys Gly Glu Val Asp Ala Lys Ser Leu 
225 230 235 240 

65 

Glu Asp Leu Ser Asp Asn Lys Val Trp Lys Met 
245 250 



70 



SUBSTITUTE SHEET (RULE 26) 



wo 97/23633 



PCT/EP96/05887 



68 

(2) INFCBRMATICN FOR SBQ ID ND:22: 

(i) SBjmCE CBARACIEHISnCS: 

(A) LENGTIH: 3550 base pairs 
5 (B) T5^EE: nucleic acid 

(C) Srn»NDEIKESS: double 

(D) TOPOUXaV: linear 

(ii) NDLECUI^ TVPE; ENA (genomic) 

10 

(iii) HYFCOHErriCAL: NO 
(iv) ANTT-SENSE: NO 

13 (vi) CRIGINRL SOURCE: 

(A) ORGfiNISM: Phaffia rfaodozyrta 

(B) STRAIN: CBS 6938 

(ix) FE3VIURE: 
20 (A) NAME/KEy: excn 

(B) DOCATICN: 941,, 966 

(ix) FEATORE: 

(A) NAME/KEf: intrcn 
23 (B) UXMTCN: 967.. 1077 

(ix) FEATURE: 

(A) NWVE/KEV: exan 

(B) IXXATICN: 1078.. 1284 

30 

(ix) FEATURE: 

(A) NAME/KEY-: intron 

(B) liOCATiai: 1285.. 1364 

33 (ix) FEATURE: 

(A) NAME/KE!f: excn 

(B) lOGATICN: 1365.. 1877 

(ix) FEATURE: 
40 (A) NAME/raY: intron 

(B) lOGATICN: 1878.. 1959 

(ix) FEATURE: 

(A) NAME/KE^: excn 
45 (B) KXZAnCN: 1960. .2202 

(ix) FEATXSE: 

(A) NPm/mH: intrcn 

(B) lOCATICN: 2203.. 2292 

» 

(ix) FEATURE: 

(A) MAME/KE^: exon 

(B) UDCATICN: 2293., 3325 

S3 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCAnnCN: join (941. .966, 1078.. 1284, 1365.. 1877, 1960. .2202, 

2293.. 3325) 

(D) OTHER 3NFCRMATICN: /product" "PRGcrtB GB" 

60 

(xi) SEQUENCE DESCRIPTICN: SBQ ID N0:22: 
GC3AATraCAG TiTiUJL'iTr GAOSAGAAAG GACACIGGS T TOGAAAGAGA AGATQCJIADG 60 
TTCnCTCJCA UjriuAKlCT GTiUCTlVO' AGaCKTOTrr GACACQCDtfV TOCKmCTT 120 
TOCACITTSA CrmGAACT ATGGfTQGITG GG0C3ATO00C AAAATCATTA G L' X ' XL ' IftCrr 180 

70 auxriimm ocroGArorc AicnAcn^ CMsrorrac ArrcrcACXiT AaoaocrcrrT 240 
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CTTICTICTC TOGACIGGGC CATOGAAAftG GAITVmOSA TSytfVTACKIC AClOysmTC 300 

QOTOSATCIG TSCfiOGCAfiG AATOQAOCnS TOOGAAGCTG AGrI3^a30GrrC TTdUmTC 360 

5 TOCaAIMCCA ACXSGRDGCIA TITICTGACA GAMGAIGAG ACBVTOCftAC AGCTCftAACA 420 

AACDWaSCT CntaAnAAT C3\D0a3CTC3V ACITOTTOCT CRMZTCM^ QGACTOGCXjC 480 

TGAAAGAACA GTICmGAC AAAAACAIGS TCXXTATAGG AGAAIQQGaVT G05AATCT36 540 

10 

AlGARU'lbTr QCal'lUUMjAT CAOnGAGG?^ Cfti'imwaA GGACAATn^ Cn^CTTM^ 600 

TOUmOOG MTnOCTOG M03GCATOC flGOOOOGGAT TSAIOSaCIG ATOG0CX3GAA 660 

15 ATSIGAIGKr GGTOGAAACT OSATCICTCr TTmTCITC MCTTCTCAT OCr i l-Til-' li: 720 

TCmCD^CT GACSatXSaC TOCAACICTC IT^TCRGIT CGGRAAC3WU3 AAGFIQGACAC 780 

AGAGAGAlCr TTGCTSAAGA GTib'iATiUJ AGAAAGC3GAA. AACAAAGGA;V AGAAGCXSOCXj 840 

AAGC3\C3CrcA OCMCITOG CAAGCXXXflC CJtfaOOOSATC TO3GAIRGRC ATCKICmC 900 

CCMCTCUJIPk TCPCrCOOCRPi CKaAIMftGrT TmGTOGCA MG AOG OCT CTC GOV 955 

Met Thr Ala Leu Ala 

25 15 

mr lAC GAG AT GTnUiL ' lO; ATACXTCTTC T l UUTmo: ACaOCRCTCA 1006 
lyr Tyr Gin lie 

TGTICTGCAm TCTGroiGOG TCJCTTCCAAA TCTITCRKIG ACIMCRlCr ITOCXCTGCT 1066 

CTTCnCnA G C COT CIG MC TOT ACT CIC OCA ATT CTT GGT CTT CTC 1114 
His Leu He Tyr Thr Leu Pro He Leu Gly Leu Leu 
13 10 15 20 

GGC CIG CTC ACT TOC COG AIT TIG ACA AAA TTT GAC ATC TAG AAA AIA 1162 
Gly Leu Leu Thr Ser Pro He Leu Hir Lys Phe Asp lie Tyr Lys He 
25 30 35 

40 

TCG ATC CTC GfIA TTT ATT G03 TIT AjT OCA ADC ACA CCA TOG GAC TCA 1210 
Ser He leu Val Pba He Ala Phe Ser Ala ttac Dar Pro Trp Asp Ser 
40 45 50 

45 TGG ATC ATC AGA AAT GGC OCA TSG ACA TAT CCA TCA GOG GAG ACT GGC 1258 
Trp He He Arg Asn Gly Ala Trp Ihr Tyr Pro Ser Ala Glu Ser Gly 
55 60 65 

CAA GGC GIG TIT GGA AGS TIT CIA GA GirAGTOSAC OUTXAATACr 1304 
50 Gin Gly Val Phe Gly Thr Phe leu Asp 
70 75 

CnSQOCXSOS CGTOGmOC GOGRTEACAT TTAACMCrQ AATl'lA'lUJC TGATC3tfOG 1364 

55 T GIT OCA TAT GAA GAG lAC OCT TIC TIT CTC ATT CAA AOC GTA ATC 1410 
Val Pro Tyr Glu Glu Tyr Ala Phe Phe Val He Gin Ihr Val He 
80 85 90 

ACC GQC TIG GIC TAC GTC TIG GCA ACT flGG CM2 CTT CTC OCA TCT CTC 1458 
60 Ihr Gly Leu Val Tyr Val Leu Ala Ihr Arg His Leu Leu Pro Ser Leu 
95 100 105 

GCG err occ aag act ma tog .toc goc err tct ctc gcg ctc aag qog isoe 

Ala leu Pro Lys Ihr Arg Ser Ser Ala leu Ser Leu Ala Leu Lys Ala 
M 110 115 120 125 

CTC ATC OCT CIG OOC ATT ATC TOC CTA TTT AOC GCT CAC OCC AGO OCA 1554 
Leu He Pro Leu Pro He He Tyr Leu Phe Ihr Ala His Pro Ser Pro 
130 U5 140 
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T03 OOC GAC 003 CIC GTG ACA GKT C7\C TAC TTC TAC ATO OGG GOl CIC 
Ser Pro Asp Pro Leu Val Thr Asp His Tyr Phe Tyr Met Arg Ala Leu 
145 150 155 



1602 



TOC TIA CTC flic ADC OCA OCT ADC AIG CXC TIG GC3^ GCA TIA TCA GQC 
Ser Leu Leu lie Thr Pro Pro Thr Met Leu Leu Ala Ala Leu Ser Gly 
160 165 170 



1650 



GAA TAT GCr TTC GAT TOG AAA ACT GQC OGA GCA AAG TCA ACT ATT GCA 1698 
Glu Tyr Ala Phe Asp Trp Lys Ser Gly Arg Ala Lys Ser Thr He Ala 
175 180 185 

OCA ATC AIG ATC COG ADG GTG TKT CTS AIT T3G GIA GAT TOT GTT OCT 1746 
Ala He Met lie Pro Thr Val Tyr Leu He Trp Val Asp Tyr Val Ala 
190 195 200 205 



GTC OGT CAA GAC TCP TGG TOG ATC AAC GAT GAG AftG ATT GIA GOG TGG 
Val Gly Gin Asp Ser Trp Ser He Asn Asp Glu Lys He Val Gly Trp 
210 215 220 



1794 



AGG err GGA GGrT GXA CIA OOC ATT GFG GAA OCT ATC3 TTC TTC TIA CIG 
Arg Leu Gly Gly Val Leu Pro He Glu Glu Ala Met Phe Phe Leu Leu 
225 230 235 



1842 



ADG AAT CTA AIG AIT GIT CIG GGT CIG 1CT GGC TG C?IAfiGrnGAT 
Tto Asa Leu Met He Val Leu Gly Leu Ser Ala Cys 
240 245 



1887 



CTCATDCICT dTOCirrGG T3AAAAAAGC TGITIGQCIG ATIGCIGaGA ACICADOCAT 1947 

OQGAMCTGT AG C GMT CAT ACT CAG GOC CIA TOC CTC CIA CAC QGT OGA 1996 
Asp His Thr Gin Ala Leu Tyr Leu Leu His Gly Arg 
250 255 260 



ACT ATT TAT QGC AAC AAA AAG ATG OCA TCT TCA TIT CCC CTC ATT ACA 
Thr He Tyr Gly Asn Lys Lys Met Pro Ser Ser Phe Pro Leu He Thr 
265 270 275 



2044 



40 COG OCT GIG CIC TOC CIG TTT TIT AGC AGO OGA OCA T7VC TCT TCT CPG 2092 
Pro Pro Val Leu Ser Leu Phe Phe Ser Ser Arg Pro Tyr Ser Ser Gin 
280 285 290 

OCA AAA OGT GAC TIG GAA CIG GCA GTC AftS TIG TTG GAG AAA AAG AGC 2140 
43 Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu Leu Glu Lys Lys Ser 
295 300 305 

OGG MC TTT TTT GIT GOC TOS GCP GGA TTT OCT AGC GAA CTT AGG GAG 2188 
Arg Ser Phe Phe Vod Ala Ser Ala Gly £te Pro Ser Glu Val Arg Glu 
50 310 315 320 325 

AGG CIG C?IT GGA CI GTSAGCADQC ATTCnTROG Ti'iUl'lUJtji' ClTll^AOCIT 2242 
Arg Leu Val Gly Leu 
330 

35 

CAroiGCATT OGCIGAIOG TmdTGGr GATOOOGGAC CIGCKIACW3 A TPC GCA 2299 

Tyr Ala 



60 TTC TOC COG GTG ACT GAT GAT CTT KTC GAC TCT OCT GAA GIA TCT TOC 2347 
Phe Cys Arg Val Thr Asp Asp Leu He Asp Ser Pro Glu Val Ser Ser 
335 340 345 

AAC OOG CAT GOC ACA ATT GAC ATG GIC TOC GAT TIT CTT ADC CTA CTA 2395 
65 Asn Pro His Ala Ihr He A^ Met Val Ser Asp Phe Leu Thr Leu Leu 
350 355 360 

TTT GOG OOC OOG CIA CAC OCT TOG CAA OCT GAC AAG ATC CTT TCT TOG 2443 
Phs Gly Pro Pro Leu His Pro Ser Gin Pro Asp Lys He Leu Ser Ser 
70 365 370 375 380 
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OCT TTA err cor ccr tos cac cct tcc osa coc acq oca atg tat ocx: 2491 

Pro lieu Leu Pro Pro Ser His Pro Ser Arg Pro Ihr Gly Met Tyr Pro 
385 390 395 

5 Crc CXX3 OCT OCT CXT TGC3 CIC TOG OCT GCC GAG CTC CTT CAA TTC CTT 2539 
Leu Pro Pro Pro Pro Ser Leu Ser Pro Ala Glu Leu Val Gin Phe Leu 
400 405 410 

AOC GAA MG OTT OOC GIT CAA TAG CAT TIC GCC TIC AGG TIG CIC GCT 2587 
10 Thr Glu Arg Val Pro Val Gin Tyr His Ete Ala Phe Arg Leu Leu Ala 
415 420 425 

AAG TIG CAA GGG CTS KTC OCT OGA TPC OCA CIC GAC GAA CIC CTT AGA 2635 
Lys Leu Gin Gly Leu lie Pro Arg Tyr Pro Leu Asp Glu Leu Leu Arg 
15 430 435 440 

GGA TAC ACE ACT GAT CTT ATC TIT OOC TUi TOG ACA GAG GCA GIC CAG 2683 
Gly Tyr Thr Thr Asp Leu lie Phe Pro Leu Ser Thr Glu Ala Val Gin 
445 450 455 460 



20 



GOT CGG A7\G 7W0G OCT ATC GPG ADC ACA GCT GAC TIG CTS GAC TM QCJT 2731 
Ala Arg Lys Thr Pro He Glu Ihr Thr Ala Asp Leu leu Asp Tyr Gly 
465 470 475 

CIA TCT (3TPi GCA OOC TGA GIC GCC GAG CI7V TIG CTC TAT CTC TCT TOG 2779 
Leu Cys Val Ala Gly Ser Val Ala Glu Leu Leu Val Tyr Val Ser Tcp 
480 485 490 

GCA AGT GCA OCA ACT CPG GIC OCT GOC AOC ATA GAA GAA A3A GAA OCT 2827 
Ala Ser Ala Pro Ser Gin Val Pro Ala Thr He Glu Glu Arg Glu Ala 
495 500 505 

GIG TIA GIG OCA AGC OGA GAG ATS GGA ACT GOC CTT CAG TIG GIG AAC 2875 
Val Leu Val Ala Ser Arg Glu Met Gly Thr Ala Leu Gin Leu Val Asn 
510 515 520 



ATT GCT AGG GAC ATT AAA Q3G GAC OCA ACA GAA GOG MA TTT T3VC CIA 2923 
He Ala Arg Asp He Lys Gly Asp Ala Thr Glu Gly Arg Ete Tyr Leu 
40 525 530 535 540 

CCA CTC TCA TTC TTT GCT CTT OOG GAT GAA TCA AAG CIT GOS ATC OOG 2971 
Pro Leu Ser Phe Phe Gly Leu Arg Asp Glu Ser Lys Leu Ala He Pro 
545 550 555 

45 

ACT GAT TOG AOG GAA CCT CGG OCT CAA GAT TTC GAC AAA CIC CIC ACT 3019 
Thr Asp Trp Thr Glu Pro Arg Pro Gin Asp Phe Asp Lys Leu Leu Ser 
560 565 570 

so dA TCT CCT TOG TOO ACA TTA OCA TCT TCA AAC GOC TCA GAA W3C TIC 3067 
Leu Ser Pro Ser Ser Thr Leu Pro Ser Ser Asn Ala Ser Glu Ser Phe 
575 580 585 

03G TTC GAA TOG AAG AOG TAC TOG CIT OCA TTA GIC GCC TAC GCA GAG 3115 
55 Arg Phe Glu Trp Lys Thr Tyr Ser Leu Pro Leu Val Ala Tyr Ala Glu 
590 595 600 

GAT CTT GOC AAA CKT TCT TAT AAG GGA MT GAC OGA CTT OCT ADC GPG 3163 
A^ Leu Ala Lys His Ser Tyr Lys Gly He Asp Arg Leu Pro Thr Glu 
60 605 610 615 620 

GIT CAA GOG GGA ATG GGA GOG GCT TGC GOG MC TAC CIA CTS ATC GOC 32H 
Val Gin Ala Gly Met Arg Ala Ala Cys Ala Ser Tyr Leu Leu He Gly 
625 630 635 

65 

OGA GAG ATC AAA GIC GIT TOG AAA GGA GAC GTC GGA GAG AGA AGG ACA 3259 
Arg Glu He Lys Val Val Trp Lys Gly Asp Val Gly Glu Arg Arg Thr 
640 645 650 

70 GIT GOC GGA TOG AGG AGA GIA OOG AAA GTC TTG ACT GIG GTC ATG AGC 3307 
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Val Ala Gly Trp Arg Arg Val Arg Lys Val Leu Ser Val Val Met Ser 
655 660 665 

GGA TOG GAA GGG CAG IT^AGACAGOS GAAGAAmT GACAGACAAT GA1GA1?ZGAG 3362 
5 Gly Tcp Glu Gly Gin 
670 

AATT^AAAIOl TCCICRATCT TCTTICrCTA GfflGCTCTTT ' I ' l ' IUl ' ITIUl ' AXIATGACXA 3422 

10 ACiCBMQG AACiaoocrr GCACsmviTT crcrrocxxx: cMcrrocrc cmtxitfiu; 3492 

TTICTTCnT CXSOrnTCT CX3GmACIA TCTCAKITCT TTTIUrTGCT TmcnKIC 3542 
AATdAGA 3550 

15 

(2) INFORMAnCN FOR SBQ ID ND:23: 

(i) SEQUENCE CHARACIERISnCS : 
20 (A) I^XTIH: 673 amino acids 

(B) TYPE: anoDO acid 
(D) lOPQUXnf: linear 

(ii) M3LE0ULE TfPEi protein 

23 

(xi) SEQUENCE DESCRIFTICN: SBQ ID ND:23: 

Met Thr Ala Leu Ala Tyr Tyr Gin lie His Leu He Tyr Hir Leu Pro 
15 10 15 

30 

lie Leu Gly Leu Leu Gly Leu Leu Ihr Ser Pro He Leu Thr Lys Phe 
20 25 30 

Asfp He Tyr Lys He Ser He Leu Val Phe He Ala Phe Ser Ala Tlir 
33 35 40 45 

Thr Pro Tcp Asp Ser Tcp He He Arg Asn Gly Ala Trp Tlir Tyr Pro 
50 55 60 

40 Ser Ala Glu Ser Gly Gin Gly Val Phe Gly Thr Hie Leu Asp Val Pro 
65 70 75 80 

Tyr Glu Glu Tyr Ala Ete Phe Val He Gin Thr Val He Thr Gly Leu 
85 90 95 

45 

Val Tyr Val Leu Ala Thr Arg His Leu Leu Pro Ser Leu Ala Leu Pro 
100 105 110 

Lys Thr Arg Ser Ser Ala Leu Ser Leu Ala Leu Lys Ala Leu He Pro 
50 115 120 125 

Leu Pro He He Tyr Leu Phe Thr Ala His Pro Ser Pro Ser Pro Asp 
130 135 140 

35 Pro Leu Val Thr Asp His Tyr Phe Tyr Met Arg Ala Leu Ser Leu Leu 
145 150 155 160 

He Thr Pro Pro Thr Met Leu Leu Ala Ala Leu Ser Gly Glu Tyr Ala 
165 170 175 

£0 

Phe Asp Trp Lys Ser Gly Arg Ala Lys Ser Thr He Ala Ala He Met 
180 185 190 

He Pro Thr Val Tyr Leu He Trp Val Asp Tyr Val Ala Val Gly Gin 
65 195 200 205 

Asp Ser Tcp Ser He Asn Asp Glu Lys He Val Gly Trp Arg Leu Gly 
210 215 220 

70 Gly Val Leu Pro He Glu Glu Ala Met Phe Phe Leu Leu Thr Asn Leu 
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73 



225 



230 



235 



240 



10 



15 



30 



25 



30 



35 



40 



45 



50 



53 



60 



65 



Met lie Val Leu Gly Leu Ser Ala Cys Asp His Thr Gflji Ala Leu Tyr 
245 250 255 

Leu Leu His Gly Arg Hir He Tyr Gly Asn Lys Lys Met Pro Ser Ser 
260 265 270 

Phe Pro leu He Thr Pro Pro Val Leu Ser Leu Phe Phe Ser Ser Arg 
275 280 285 

Pro lyr Ser Ser Gin Pro Lys Arg Asp Leu Glu Leu Ala Val Lys Leu 
290 295 300 

Leu Glu Lys Lys Ser Arg Ser Phe Phe Vcd Ala Ser Ala Gly Phe Pro 
305 310 315 320 

Ser Glu Val Arg Glu Arg Leu Val Gly Tyr Ala Phe Cys Arg Val Thr 
325 330 335 

Asp Asp Leu He Asp Ser Pro Glu Val Ser Ser Asn Pro His Ala Ihr 
340 345 350 

He Asp Met Val Ser Asp Phe Leu Thr Leu Leu Phe Gly Pro Pro Leu 
355 360 365 

His Pro Ser GLn Pro Asp Lys He Leu Ser Ser Pro Leu Leu Pro Pro 
370 375 380 

Ser His Pro Ser Arg Pro Thr Gly Met Tyr Pro Leu Pro Pro Pro Pro 
385 390 395 400 

Sea: Leu Ser Pro Ala Glu Leu Val Gin Phe Leu Thr Glu Arg Val Pro 
405 410 415 

Val Gin Tyr His Phe Ala Phe Arg Leu Leu Ala Lys Leu Gin Gly Leu 
420 425 430 

He Pro Arg lyr Pro Leu Asp Glu Leu Leu Arg Gly Tyr Ihr Thr Asp 
435 440 445 

Leu He Phe Pro Leu Ser Ihr Glu TUa Val Gin Ala Arg Lys Thr Pro 
450 455 460 

He Glu Ihr Thr Ala Asp l£U Leu Asp Tyr Gly Leu Cys Val Ala Gly 
465 470 475 480 

Ser Val Ala Glu Leu Leu Val Tyr Val Ser Trp Ala Ser Ala Pro Ser 
485 490 495 

Gin Val Pro Ala Tlir He Glu Glu Arg Glu Ala Val Leu Val Ala Ser 
500 505 510 

Arg Glu Met Gly Thr Ala Leu Gin Leu Val Asn He Ala Arg Asp He 
515 520 525 

Lys Gly Asp Ala Thr Glu Gly Arg Phe Tyr Leu Pro Leu Ser Phe Phe 
530 535 540 

Gly Leu Arg Asp Glu Ser Lys Leu Ala He Pro Tlir Asp Trp Thr Glu 
545 550 555 560 

Pro Arg Pro Gin Asp Phe Asp Lys Leu Leu Ser Leu Ser Pro Ser Ser 
565 570 575 

Thr Leu Pro Ser Ser Asn Ala Ser Glu Ser Phe Arg Phe Glu Trp Lys 
580 585 590 

Thr Tyr Ser Leu Pro Leu Val Ala lyr Ala Glu Asp Leu Ala Lys His 



595 



600 



60S 
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Ser Tyr Lys Gly lie Asp Atg Leu Pro Thr Glu Val Gin Ala Gly Met 
610 €15 620 

Arg Ala Ala Cys Ala Ser Tyr Leu Leu lie Gly Arg Glu lie Lys Val 
5 625 630 €35 640 

Val Tcp Lys Gly Asp Val Gly Glu Arg Arg Thr Val Ala Gly Trp Arg 
645 €50 655 

10 Arg Vcd Arg Lys Val Leu Ser Val Val Met Ser Gly Trp Glu Gly Gin 
660 665 670 



(2) INPORMKnCN FOR SBQ ID ND:24: 

(i) SBCPENC£ O^ARACTERISTTCS : 

(A) USNOIH: 570 base pairs 

(B) TCTE: nucleic acid 

(C) STKAtmHESS: double 

(D) IDPQLOGy: linear 

(ii) MJLEfJJLE TYPE: cIMA 

(iii) HYPaiUbTlCAL: NO 

<iv) ANIT-SQBE: VO 

(vi) ORIGINAL SOURCE: 

(A) CRSANISM: E^fia rhsdszyna 

(ix) FEAIURE: 

(A) NRME/KEY: CDS 

(B) LOCATICN: 24.. 500 

(D) OTHER INEOa^ffillCN: /prxxJuct= "PRcEKAlO" 

(xi) SEQUENCE DESCRIPTICN: SBQ ID N0:24: 

AACAL'i'lUb'i* TfiffnTOGAC GAC ATCS C3^ A3C TIC C?IA AAG ADC CTC A0C5 50 

Met Gin He Phe Val Lys Thr Leu Ihr 
1 5 

OCT AAG AOC A3t: AOC CTT GAG GIG GAG TCT TCP GAC ADC ATC GRC AAC 98 
Gly Lys Ihr He Thr Leu Glu Val Glu Ser Ser Asp Tlir He Asp Asn 
10 15 20 25 

arc PiPG GCC AAG KTC CPC GAC AAG GAA GGA ATT OCT OCT GAT CMS CPG 146 
Val Lys Ala Lys He Gin Asp Lys Glu Gly He Pro Pro Asp Gin Gin 
30 35 40 

OGA err ATC TIC GOC OCT AAG CBG CIC GAG GAT GGC OGA ADC CTT TOG 194 
Arg Leu He Phe Ala Gly Lys Gin Leu Glu Asp Gly Arg Ihr Leu Ser 
45 50 55 

GAP 1AC AAC ATC CAG AAA GAG TOC ADC CTC GAC CIC GIC CTT AGG TIG 242 
Asp Tyr Asn He Gin Lys Glu Ser Thr Leu His Leu Val Leu Arg Leu 
60 65 70 

OGA GGA GGA GOC AAG AAG OGA AAG AAG PJC CMS TAC ACT AOC CCC A;^ 290 
Arg Gly Gly Ala Lys Lys Arg Lys Lys Lys Gin Tyr Tlir Thr Pro Lys 
75 80 85 

AAG ATC AAG CAC AAG OGA AAG AAG GTIC AAG ATG GOT ATT OT AAG TAG 338 
Lys He Lys His Lys Arg Lys Lys Val Lys Met Ala He Leu Lys Tyr 
90 95 100 105 

TAC AAG GIC GAC TCT GAT GGA AAG ATC AAG OGA CTT OGT OGA GAG T3C 386 
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Tyr Lys Val Asp Ser Asp Gly Lys He Lys Arg Leu Arg Arg Glu Cys 
110 115 120 

OCC CRG OOC CAG 1GC OGA GCT GGTT ATC TTC ATG GCT TIC 0\C TCC AAC 434 
s Pro Gin Pro Gin Cys Gly Ala Gly He Phe Met Ala Phe His Ser Asn 
125 130 135 

03A CAG ACT TGC GGA AAG TGT GCT CTT P€C TAG AOC TTC GOC GAG GGA 482 
Arg Gin Hhr Cys Gly Lys Cys Gly Leu Ihr Tyr Thr Phe Ala Glu Gly 
10 140 145 150 

AOC CAG OOC TOT GOT mSKrCKTCA ATTOnTCIT COOGAGOGAT CITIGAGICT 537 
Ihr Gin Pro Ser Ala 
155 

IS 

I'luriftCMT CICAAAAAAA AAAAAAAAAA AAA 570 



(2) INFDRMATICN FOR SEIQ ID MD:25: 

20 

(i) SBQUEICE CHARACI£KISTIG5 : 

(A) L£M3IH: 156 amino acids 

(B) TYPE: amino acid 
(0) TDPOLCXSVr: linear 

25 

(ii) M3I£CUL£ TYPE: protein 

(xi) SEQUENCE DESCRIPnCN: SBQ ID NO:25: 

30 Met Gin lie Phe Val Lys Thr Leu Thr Gly Lys Thr He Thr Leu Glu 
15 10 15 

Val Glu Ser Ser Asp Thr He Asp Asn Val Lys Ala Lys He Gin Asp 
20 25 30 

33 

Lys Glu Gly He Pro Pro Asp Gin Gin Arg Leu He Phe Ala Gly Lys 
35 40 45 

Gin Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr Asn He Gin Lys Glu 
•w 50 55 60 

Ser Thr Leu His Leu Val Leu Arg Leu Arg Gly Gly Ala Lys Lys Arg 
65 70 75 80 

45 Lys Lys Lys Gin Tyr Thr Thr Pro Lys Lys He Lys His Lys Arg Lys 
85 90 95 

Lys Val Lys Met Ala He Leu Lys Tyr Tyr Lys Val Asp Ser Aap Gly 
100 105 110 

50 

Lys He Lys Arg Leu Arg Arg Glu Cys Pro Gin Pro Gin Cys Gly Ala 
lis 120 125 

Gly He Phe Met Ala Phe His Ser Asn Arg Gin Thr Cys Gly Lys Cys 
55 130 135 140 

Gly Leu Thr Tyr Thr Ihe Ala Glu Gly Thr Gin Pro Ser Ala 
145 150 155 



(2) DJPCRMATICN FOR SEQ ID ND:26: 

(i) SEOTEKCE CHARACIHIISTTCS : 

(A) UNSIH: 303 base pairs 

(B) TYPE: nucleic acid 

(C) STKANDECMESS : dcfuble 

(D) TOPOLOGY: linear 

(ii) lynLfiOJLE TYPE: cENA 
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(iii) HyPOIHEnaVL: NO 

(iv) ANTI-SENSE: ND 

i (vi) QRIGIMAL SCXIRCE: 

(A) OI^GANISM: Phaffia rhodozyma 

(ix) FEAIURB: 

(A) NAME/KEY: CDS 
JO (B) UX3VTTGN: 57.. 278 

(D) OriHER INFCRMAnCN: /product^ "PRcENAll" 

SEQUENCE DESCRIFTIGN: SEQ ID N0:26: 

15 

TnM3\CAOl AAOCITOCCT AOdTITCAA CAACSWVTCA CW^CIAAGCT TACATC 56 

ATG GRG TCC ATC AM ACXT TOS ATT TOC AAC OCX: GCC AAC mC GCT TCT 104 
Met Glu Ser lie Lys Thr Ser lie Ser Asn Ala Ala Aan Tyr Ala Ser 
20 1 5 10 15 

GAGACTCnCAACOGGCrACTAQCGCrAlXTCrAAGGAGQCC 152 
Glu Ihr VclL Asn Gin Ala Thr Ser Ala Ihr Ser Lys Glu Ala Asn Lys 
20 25 30 

23 

GAG GIT GCC AAG GAC TCC AAT GCC GGA C?IT GQA ACC GGA ATC AAC GOC 200 
Glu Val Ala Lys Asp Ser Asn Ala Gly Val Gly Thr Arg lie Asn Ala 
35 40 45 



GGA ATT GAT GCT CTT GGA GRCAMGCCGACGAGACTTOGTCr GAT QCC 248 
Gly lie Asp Ala Leu Gly Asp Lys Ala Asp Glu Thr Ser Ser Asp Ala 
50 55 60 

AM TCC AAG GOC TMZ AAG GAG AAC ATC TOAbTIA'i'I'i' AGAXAGTCGT 295 
Lys Ser Lys Ala Tyr Lys Gin Asn lie 
65 70 

OCATRTIT 303 



(2) INPC5WATICN FOR SEQ ID ND:27: 

4} (i) SEQJSiKS CHARACmOSnCS : 

(A) IQI?IH: 73 amino acids 

(B) T^fFE: amino acid 
(D) TOVOUXS^: linear 

50 (ii) M01H3]t£ TiPE: protein 

(Xi) SEQUENCE DESCRIPnCN: SEQ ID ND:27: 

Met Glu Ser lie Lys Ihr Ser He Ser Asn Ala Ala Asn lyr Ala Ser 
55 1 5 10 15 

Glu Hu: Val Asn Gin Ala Ihr Ser Ala Ihr Ser Lys Glu Ala Asn Lys 
20 25 30 

w Glu Val Ala Lys Asp Ser Asn Ala Gly Val Gly Thr Arg He Asn Ala 
35 40 45 

Gly He Asp Ala Leu Gly Asp Lys Ala Asp Glu Thr Ser Ser Asp Ala 
50 55 60 



65 



Lys Ser Lys Ala Tyr Lys Gin Asn He 
65 70 



TO (2) INPORMATICN FOR SBQ ID N3:28: 
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(i) SBQCJEN::E GKARACIEKISTICS: 

(A) I£N3TH: 307 base pairs 

(B) TCPE: nucleic acid 

(C) STTU^NDECNESS: double 
5 (D) TOVOUXTf: linear 

(ii) M3LEXI££ TYPE: dWA 

(iii) HVPDIHEmCAL: NO 

10 

(iv) ANTI-SENSE: ND 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Phaffia rboc3bzyma 

1) 

(ix) FEAIURE: 

(A) NAME/KEY: CDS 

(B) LOCATia T: 3 .. 227 

(D) OTHER INFDRMAnCN: /product= "PRc!CNA18" 

20 

(xi) SEdUEKOE EESCRIPnCN: SBQ ID NO:28: 

AC CXT TOC ATC GfiG TCT GAG GOC CX3A CAA CAC AAG CTC AAG AOG CTT 47 
:5 Pro Ser lie Glu Ser Glu Ala Arg Gin His Lys Leu Lys Arg Leu 
15 10 15 



CTS CAG AGC OOC AAC TCT TIC TTC ATC GAC CJIC AAG TGC CXT QGTr IOC 95 
Val Gin Ser Pro Asn Ser Phe Phe Met Asp Val Lys Cys Pro Gly Cys 
20 25 30 

TIC Otfj ATC ADC ACC (JTG TTC TCG CAC OCT TCC ACT GOC GIT CM IGT 143 
Phe Gin He Hir Thr Val Phe Ser His Ala Ser Thr Ala Val Gin Cys 
35 40 45 

QGA ICG TC3C CAG AOC ATC CTC T3C CAG OCC OSG GGA QGA AAG GCT OGA 191 
Gly Ser Cys Gin Ihr He Leu Cys Gin Pro Arg Gly Gly Lys Ala Arg 
50 55 60 

CIT AOC Gft3 GGA TQC TCT TIC OCSA 0C3A AAG AAC TAAGITICIG TIATOGGATO 244 
Leu TTir Glu Gly Cys Ser Phe Arg Arg Lys Asn 

65 70 75 

ATGCATTCAA AIAAAAGTCA AAAAAAAAAA AAAAAAAAAC TGGAOGGGQG GGOOQGIACC 304 

CAA 307 



(2) INFCRMKnCN PGR SEQ ID N0:29: 

(i) SB3iJE3X^ aiARACI^3a:STICS : 

(A) UeNJIH: 74 amino acids 

(B) TVIG: amino acid 
(D) TQPQIiOGV: linear 

(ii) MIKECUI^ T^: protein 

(xi) SBQUEM:X: DBSGRIFTICI]: SEQ ID NO: 29: 

Pro Ser He Glu Ser Glu Ala Arg Gin His Lys Leu Lys Arg Leu Val 
15 10 15 

Gin Ser Pro Asn Ser Phe Phe Met Asp Val Lys Cys Pro Gly Cys Phe 
20 25 30 

Gin He Thr Tto Val Phe Ser His Ala Ser Tlir Ala Val Gin Cys Gly 
35 40 45 
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Ser Cys Gin Ttir lie Leu Cys Gin Pro Arg Gly Gly Lys Ala Arg Leu 
50 55 60 

Thr Glu Gly Cys Ser Phe Arg Arg Lys Asn 
5 65 70 



(2) INPaS?MRTICN FOR SBQ ID ND:30: 

(i) SBJJENCB CHKRACISRISnCS ; 

(A) 502 base pairs 

(B) TYPE: nucleic acid 

(C) STPmSEOJESS: double 

(D) TOPQDOGy: linear 

(ii) MO(t£CULE TYPE: cENA 
(iii) HYPOIHErnCAL; NO 
(iv) ANTI-SBNSE: NO 

(vi) QRIGINRL SOGRCE: 

(A) ORGfiNISM: Phaffia 3±oclozynB 

(ix) FEATURE: 

{A) NnHE/KEY: CDS 

(B) LOCAnCN: 30.. 359 

(D) OIHER INPCKMATICN: /product^ "PElcENA35" 

(xi) SBOOBNCE DESCSamCN: SEQ ID NO: 30: 

GTCP€CTCCG GCTIMATOG ATTOCJrACA ATG TCT GAA CIC GCC GOC TOC T3VC 53 

Met Ser Glu Leu Ala Ala Ser Tyr 
1 5 

GCC GCr dT ATC OTC GOC GAC GAG OCT ATT GAG ATC ACC TCT GAG AAG 101 
Ala Ala Leu lie Leu Ala Ai^ Glu Gly lie Glu lie Thr Ser Glu Lys 
10 15 20 

CTC GTC ACT CTC ACT ADC GOC GCC AAG GIT GAG CIT GAG COC ATC TC9G 149 
Leu Val Tto Leu Hir Ihr Ala Ala Lys Val Glu Leu Glu Pro lie Trp 
25 30 35 40 

GOC ACT CIC err OOC AAG QCC CTC GN3 GGA AfiG AAC GTC AAG GAG TPS 197 
Ala Ttir Leu Leu Ala Lys Ala Leu Glu Gly Lys Asn Val Lys Glu ibeu 
45 50 55 

CIT TCX: AAC GTC GGA TOC GGA GOC GGA GGA OCT GOC OOC GOC GOC GOC 245 
Leu Ser Asn Val Gly Ser Gly Ala Gly Gly Ala Ala Pro Ala Ala Ala 
60 65 70 

GTC GOC GGT GGA GCT TOC GOT GAC GCC TCT GCC OCC GCT GAG GAG Aft3 293 
Val Ala Gly Gly Ala Ser Ala Asp Ala Ser Ala Pro Ala Glu Glu Lys 
75 80 85 

AAG GAG GAG AAG GCT GAG GAC AAG GAG GAG TCT GAC GAC GAC AIG GGT 341 
Lys Glu Glu Lys Ala Glu Asp Lys Glu Glu Ser Asp Asp Asp Met Gly 
90 95 100 

TIC GGA CIT TIC GAT TAAACTCOCT OQOCDWUA OA-TlTlUrr CAACTJOOCrC 396 
Phe Gly Leu Phe Asp 
105 110 

TOGTOQCATC GTiUACIOGA OOGCIGOSIT ' lUnUilXTr TOCTCAaSAA ' miUlLVlT 456 
GrCIGOTlTC OCAKINaaAT NTOCTIGAAA TSANGmcC CAATIG 502 



TO (2) INFCKMKnCN FCR SEQ ID N0:31: 
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(i) SEQUENCE CHARACTHRISTICS : 

(A) I£NCjIH: 109 amino acids 

(B) TVPE: amino acid 
(D) TOPQUDGY: lixjear 

(ii) rtMCULE TSfPE: protein 

(xi) SE)QUENC£ DESCRIFTXCN: SEQ ID N3:31: 

Met Ser Glu Leu Ala Ala Ser Tyr Ala Ala lieu He Leu Ala Asp Glu 
15 10 15 

Gly He Glu He Ihr Ser Glu Lys Leu Val Ihr Leu Ihr Thr Ala Ala 
20 25 30 

Lys Val Glu Leu Glu Pro He Tirp Ala Ihr Leu Leu Ala Lys Ala Leu 
35 40 45 

Glu Gly Lys Asn Vcd Lys Glu Leu Leu Ser Asn Val Gly Ser Gly Ala 
50 55 60 

Gly Gly Ala Ala Pro Ala Ala Ala Val Ala Gly Gly Ala Ser Ala Asp 
65 70 75 80 

Ala Ser Ala Pro Ala Glu Glu Lys Lys Glu Glu Lys Ala Glu Asp Lys 
85 90 95 

Glu Glu Ser Asp Asp Asp Met Gly Phe Gly Leu Phe Asp 
100 105 



(2) INFORMATIGN PGR SEQ ID NO:32: 

(i) sb;:!Uence: characikristics: 

3s (A) I£NC?IH: 381 base pairs 

(B) TYPE: nucleic acid 

(C) srafiNPFn^ESS : double 

(D) TOPOLOGY: linear 

40 (ii) MXE3CUL£ TYPE: dNA 

(iii) KYPCriHETICAL: NO 
(iv) ANTI -SENSE: ND 

45 

(vi) 03UGINAL SCURCD: 

(A) ORGANISM: Phaffia rhodozyma 

(ix) FEKSWE: 
so (A) NAME/KEY: CTS 

(B) UX3OTCN: 7.. 282 

(D) OIHSR TNFCStMKTlCtl: /products "P1^c£MA36" 

3> (xi) SEQUENCE DESCRIFTICU: SBQ IDND:32: 

CTCMG ATS AOC AAA GGT ACC TOC lUT TTC GGT AAG OGA CAC AOC AAG 48 
Met Ihr Lys Gly ihr Ser Ser Phe Gly Lys Arg His Hu: Lys 
15 10 

60 

ACX: CPC ACT ATC IGC OSA OGA TCT GGT AAC AQG GCT TTC CPC AOS CSG 96 
Thr His Ihr He Cys Arg Arg Cys Gly Asn Arg Ala Phe His Arg Gin 
15 20 25 30 

65 AAGAAGAarTCTGCCCAGTCrOGATTyrOCTGCrGCrAAG^^ 144 
Lys Lys Ihr Cys Ala Gin Cys Gly Tyr Pro Ala Ala Lys Met Arg Ser 
35 40 45 

TIC AAC TOG GGA GAG AAG GOC AAG AOG AGA AAG ACC ACC GGT AOC OCT 192 
70 Phe Asn Trp Gly Glu Lys Ala Lys Arg Arg Lys Ihr Thr Gly Thr Gly 
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50 55 60 

OGA ATG CAG CAC CTC AAG GAC GTC TCT OSA OSA TTC AfiG AAC GQC TIC 240 
Arg Met Gin His Leu Lys Asp Val Ser Arg Arg Phe Lys Asn Gly Phe 
5 65 70 75 

CX3A GAG GGA ACT TOC GOC ACC AAG AAG CTC AAG QCX: GRG lAATOGOTTr 289 
Arg Glu Gly Thr Ser Ala Ihr Lys Lys Val Lys Ala Glu 
80 85 90 

0 

ATOCATCADC TOGTrSKTCAG GQaaQCJIAAT AATCTmCT TAGfiGACIAT OXIUTltJlU 349 
CP30CX3CAIC AAACAAAAAA AAAAAAAAAA AA 381 

13 

(2) INPCRMRTICN FOR SEQ ID NO:33: 

<i) SBQCENCE CHARACTERISTICS: 

(A) 91 amino acids 

!0 (B) TifPE: antmo acid 

(D) TOPQLCXSy: lixtear 

(11) MDIfmrn type: protein 

15 (xi) SEQUENCE DESCRIPTICN: SEQ ID ND:33: 

Met Thr Lys Gly Thr Ser Ser Phe Gly Lys Arg His Ihr Lys Thr His 
15 10 15 

» Thr lie Cys Arg Arg Cys Gly Asn Arg Ala Phe His Arg Gin Lys Lys 
20 25 30 

Thr Q/s Ala Gin Cys Gly Tyr Pro Ala Ala Lys Met Arg Ser Fhe Asn 
35 40 45 

IS 

Trp Gly Glu Lys Ala Lys Arg Arg Lys Thr Thr Gly Thr Gly Arg Met 
50 55 60 

Gin His Leu Lys Asp Val Ser Arg Arg Phe Lys Asn Gly Phe Arg Glu 
w 65 70 75 80 

Gly Thr Ser Ala Thr Lys Lys Val Lys Ala Glu 
85 90 

(2) INPOKMAXICN FOR SEQ TO ND:34: 

(i) SECPEtKIB CHARACTERISTICS: 
(A) I£DC?XH: 473 base pairs 
so (B) TYPE: nucleic acid 

(C) SIRANDECNESS: dsuble 

(D) TDPODDGY: linear 

(ii) M3I£CtJI£ TYPE: dCNA 

55 

(ill) HYPOIHErnCAL: ND 

(iv) ANTI-SENSE: NO 

« (vi) QRIGIMAL SOURCE: 

(A) CKGANISM: Phaffia rhodozyna 

(ix) FEKIURE: 

(A) NAME/KEY: CDS 
65 (B) LXAITCN: 19.. 321 

(D) OTHER INFORMAITCN: /products "PRcENA46" 



(xi) SECfSWCE DESOaPTIGN: SEQ ID N0:34: 



70 
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CICAflfiAAGA AACrCSQCX: ATG CCT AOC OGA TIC TOC AAC ADC OoA AAG CAC Si 
Met Pro Thr Arg Phe Ser Asn Ihr Arg Lys His 
15 10 

5 AGA OGA CAC GTC TCT GOC OCT CAC 03T CCTT GIG GGA CAC AGA AAG 99 
Arg Gly His Val Ser Ala Gly His Gly Arg Val Gly Lys His Arg Lys 
15 20 25 

CAC OCA GGA OGA 03A GGT CIT GCT GGA GGA CAG CAC CAC CAC OGA ACC 147 
10 His Pro Gly Gly Arg Gly Leu Ala Gly Gly Gin His His His Arg Thr 
30 35 40 

AAC TIC GAT AAG TMZ CAC OCT GGA TAC TIC GGA AAG GIC GGA AIG AGG 195 
Aan Phe Asp Lys Tyr His Pro Gly lyr Phe Gly Lys Val Gly Met Arg 
iJ 45 50 55 

CAC TIC CAC err ADC OGA NAC TCT TCX:: tog TGC OCT AOC GIC AAC MT 243 
His Phe His Leu TtiX Arg Xaa Ser Ser Trp Cys Pro Ihr Val Asn lie 
60 65 70 75 

m 

GAC NTWa CIC TOG ACT CIC GIC OOC GCT GAG GAG AAG AAG GAC TIC COC 291 
Asp Xaa Leu Trp Ihr Leu Val Pro Ala Glu Glu Lys Lys Asp Phe Pro 
80 85 90 

25 

AAC CAG GCT OGA OCT OGT COC OGT TGT TGACACmG GCTCICGGrr 338 
Asn Gin Ala Arg Pro Arg Pro Arg Cys 
95 100 

)o AOaGOAAIGT TCTIGGCAAG GGICEACTIC OOCAGATOOC TTIAATOTIC AAQGCOOGAT 398 

TCNmOOGC TCnOOOGfiG AANAANRTCU ANSAMSCiaG TTOGAArPCX: ' ILUUXJCm ' 458 

GTICOCOOCN T3«NG 473 

(2) INFCS^MATICN FCR SEQ ID ID: 35: 

(i) SBQUEMCE CHARACIXRISTXCS : 
40 (A) UESXJJH: 100 airtino acids 

(B) TiPB: andno acid 
(D) T0POLCX3Y: linear 

(ii) MDLBCULE TfPE: protein 

43 

(xi) SEXyjEHJCE DESGSIFncrT: SEQ ID NO: 35: 

Met Pro Ihr Arg Phe Ser Asn Ihr Arg Lys His Axg Gly His VauL Ser 
15 10 15 

so 

Ala Gly His Gly Arg Val Gly Lys His Arg Lys His Pro Gly Gly Arg 
20 25 30 

Gly Leu Ala Gly Gly Gin His His His Azg Thr Asn £he Asp Lys Tyr 
a 35 40 45 

His Pro Gly Tyr Phe Gly Lys Val Gly Met Arg His the His Leu ihr 
50 55 60 

60 Arg Xaa Ser Ser Trp Cys Pro Ihr Val Asn He Asp Xaa Leu TTp Ihr 
65 70 75 80 

Leu Val Pro Ala Glu Glu Lys Lys Asp Phe Pro Asn Gin Ala Arg Pro 
85 90 95 

63 

Arg Pro Arg Cys 
100 



10 (2) INPCRMATTCN PGR SEQ ID ND:36: 
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(i) SBC^JEMCE GHKRACiroiSnCS : 

(A) l^tXJIH: 608 base pairs 

(B) TVPE: nucleic acid 

(C) ffTBMflDESiaSSS: double 
5 (D) TDPaUDG^: linear 

(ii) MDLECU[£ TYPE: cENA 

(iii) HYPaiHBnC3tti: NO 

!0 

(iv) ANTI-SENSE: NO 

(vi) 0RIG3NRL SOURCE: 

(A) CKGANIS4: Phaffia rhocbzyma 

IS 

(ix) FEATURE: 

(A) NAME/KEY: OB 

(B) LDam:CK: 18.. 453 

(D) COHEK INFCS^MTHTCN: /product= "PRc£NA64" 

20 

(xi) SBQOaO: DESCRIPITai: SBQ ID N0:36: 

AflfiACTOCTC GITCaGC ATS TOC TOO GTC AAA QOC AOC AAA OGA AAG GOT 50 
25 Met Ser Ser Val Lys Ala Thr Lys Gly Lys Gly 

15 10 

CXr GGC GCr tog GCT gat git AAG GCC AAG QOC GOC AAG AAG OCT GOC 9B 
Pro Ala Ala Ser Ala Asp Val Lys Ala Lys Ala Ala Lys Lys Ala Ala 
30 15 20 25 

Crc AAG GGT ACT CPG TCT ACT TOC ACT U3G AAG GTC OGA ACT TOG GTIC 146 
Leu Lys Gly Thr Gin Ser Thr Ser Thr Arg Lys Val Arg Thr Ser Val 
30 35 40 

35 

TCT TTC CM2 OGA OOC AAG ACT CTC OGA CTT COC 03A OCT OOC AAG 194 
Ser Etoe His Arg Pro Lys Thr Leu Arg Leu Pro Arg Ala Pro Lys Tyr 
45 50 55 

40 COC OGA AAG TOG CTC OCT CAC GOC OCT OGA ATG GAT GAG TTC OGA ACT 242 
Pro Arg Lys Ser Val Pro His Ala Pro Arg Met Asp Glu Phe Arg Thr 
60 65 70 75 

ATC ATC CPC OOC TTG GCT AOC GAG TCC QOC ATG AAG AAG ATT GAG GAG 290 
45 lie lie His Pro Leu Ala Thr Glu Ser Ala Met Lys Lys He Glu Glu 
80 85 90 

CTC AAC ADC CTT GIG TIC ATC GTC GKT CTC AAG TOC AAC AAG OGA CM 338 
His Asn Thr Leu Val Phe He Val Asp Val Lys Ser Asn Lys Arg Gin 
50 95 100 105 

ATC AAG GAC GOC GTC AAG AAG CTC TPC GAG GTC GAT ADC GTC CAC OTC 386 
He Lys Asp Ala Val Lys Lys Leu Tyr Glu Val Asp Thr Val His Xaa 
110 115 120 

55 

AAC NOC TIG ATC ADC OOC QOC OGA AGG AAG AM3 CTT AOG TOC GAC TTA 434 
Asn Xaa Leu He Thr Pro Ala Gly Arg Lys Lys Leu Thr Ser Asp Leu 
125 . 130 135 

60 OOC COG ADC AOS ADG CTC T TAA0G?IT3CC AACAMGOOG QCIACATCIA 483 
Pro Pro Thr Thr Thr Leu 
140 145 

ATOGACrOCA TOOCTIGGAT OGGTrcAGiT GlTiUJmU CATOOGGnT CSGACTITGA 543 

65 

OGAOCTIGAA ACTCMAANAC TTTGGATGCA TGriTQAAAT TCIQ^AAAEA AAAAAAAAAA 603 
AAAAA 608 
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(2) INFORMKnCN FOR SBQ ID J30:37: 

(i) SDQUEtCB CHARAdCRIsnCS: 

(A) l£NSIH: 145 ammo acids 
s (B) TYPE: amino acid 

(D) TOPCJLOGV^: linear 

(ii) yOUSUJLE TYPE: protein 

10 (xL) SEQUENCE DESCRIPTICN: SBQ ID NO:37: 

Met Ser Ser Val Lys Ala Ihr Lys Gly Lys Gly Pro Ala Ala Ser Ala 
15 10 15 

13 Asp Val Lys Ala Lys Ala Ala Lys Lys Ala Ala Leu Lys Gly Thr Gin 
20 25 30 

Ser Ihr Ser Ihr Arg Lys Val Arg Hir Ser Val Ser Phe His Acs Pro 
35 40 45 

20 

Lys Ihr Leu Arg Leu Pro Aig Ala Pro Lys Tyr Pro Arg Lys Ser Val 
50 55 60 

Pro His Ala Pro Axg Met Asp Glu Phe Arg Ihr He He His Pro Leu 
25 65 70 75 80 

Ala Ihr Glu Ser Ala Met Lys Lys He Glu Glu His Asn Ihr Leu Val 
85 90 95 

JO Ete He Val Asp Val Lys Ser Asn Lys Arg Gin He Lys Asp Ala Val 
100 105 110 

Lys Lys Leu Tyr Glu Val Asp Ihr Val His Xaa Asn Xaa Leu He Ihr 
115 120 125 

35 

Pro Ala Gly Arg Lys Lys Leu Ihr Ser Asp Leu Pro Pro Ihr Ihr Ihr 
130 135 140 

Leu 
40 145 

(2) INPCRMRTICN FOR SEQ ID NO:38: 

(i) SBQUEKCE CHARACTERISTICS: 
4s (A) X£NC?IH: 466 base pairs 

(B) TYPE: nucleic acid 

(C) SIKANDEINESS: double 

(D) TOPOLOGY: linear 

30 (ii) l^CSLECJJLE TYPE: c£NA 

<iii) HVPLTlUbTlCAL: ND 
(iv) AMm-SaJSE: NO 

55 

(vi) CSaGIMAL SOURCE: 

(A) ORGANISM: Phaffia riY3c^)zyna 

(ix) FE7VIURE: 
M (A) NAMB/KEY: CDS 

(B) lOTOTCN: 81. .416 

(D) CnUER INEOKMATICN: /produCt= "PRciCNAfiS" 

65 (xi) SBQOEICE DESCRIPnCN: SEQ ID NO: 38: 

dTTGAAOCr OCAfiOCIOaG CATCAAGCAC TT^CTCAQCXrT OGGCnAAAT OGATTOTICT 60 

AGOCnrCAA AL'lUaiAAAA AIG AAG CMZ ATC QCC GCT 1AC TIG CIC CTC HO 
70 Met Lys His He Ala Ala Tyr Leu Leu Leu 
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15 10 

Gcx: Acx: acrr oga aac nx tqc ccc tct goc goc gat gtc aag goc cic ise 

Ala Thr Qly Gly Asn Xaa Ser Pro Ser Ala Ala Asp Val Lys Ala Leu 
5 15 20 25 

err GOC Aoc gtc gac atc grg got gat gac goc oga err gag acc ctc 206 
Leu Ala Thr Val Asp He Glu Ala Asp Asp Ala Arg Leu Glu Ihr Leu 
30 35 40 

10 

ATX: TOC GAG CTT AAC GQC AAG GAC TIG AAC ACC CTC ATC GOT GAG GGA 254 
He Ser Glu Leu Asn Gly Lys Asp Leu Asn Thr Leu He Ala Glu Gly 
45 50 55 

15 TOC GOC AAG CTC GCT TOC GTC OOC TOC GGA GSA GOC GOC TOT TOC OCT 302 
Ser Ala Lys Leu Ala Ser Val Pro Ser Gly Gly Ala Ala Ser Ser Ala 
60 65 70 

GOC OOC GOC GOC GOT GGA GGA GOC GOC GOC OCT GOC GOT GAG GAT AAG 350 
30 Ala Pro Ala Ala Ala Gly Gly Ala Ala Ala Pro Ala Ala Glu Asp Lys 
75 80 85 90 

AAG GAG GAG AAG GTC GAG GAC AAG GAG CfG TUT GAC GAC CSPC AIG GGTT 398 
Lys Glu Glu Lys Val Glu Asp Lys Glu Glu Ser Asp Asp Asp Met Gly 
25 95 100 105 

TTC GGA CTT TTC GAT TOAACTOCTT ACAOCTmT CAAACrCITC Gi' I UjC'XtJG A 453 
Phe Gly Leu Phe Asp 
110 

30 

GGQQQQQOGC GGTT 466 



35 



40 



(2) INFCKMATICN FOR SEQ ID NO:39: 

(i) SBQUE2IGB CSARAdCKISTICS : 

(A) LCNC7IH: 111 amino acids 

(B) TYPE: amino acid 
(D) TOPQUX^: linear 

(ii) WUaJJ[£ TYPE: protein 

(xi) SEQUENCE DESCMPnCN: SBQ ID N0:39: 

45 Met Lys His He Ala Ala Tyr Leu Leu Leu Ala Thr Gly Gly Asn Xaa 
15 10 15 

Ser Pro Ser Ala Ala Asp Val Lys Ala Leu Leu ALa Thr Val Asp He 
20 25 30 

so 

Glu Ala Asp Asp Ala Arg Leu Glu Ihr Leu He Ser Glu Leu Asn Gly 
35 40 45 

Lys Asp Leu Asn Thr Leu He Ala Glu Gly Ser Ala Lys Leu Ala Ser 
55 50 55 60 

Val Pro Ser Gly Gly Ala Ala Ser Ser Ala Ala Pro Ala Ala Ala Gly 
65 70 75 80 

60 Gly Ala Ala Ala Pro Ala Ala Glu Asp Lys Lys Glu Glu Lys Val Glu 
85 90 95 

Asp Lys Glu Glu Ser Asp Asp Asp Met Gly Phe Gly Leu Phe Asp 
XOO 105 110 
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(2) INFOFMATICN FOR SEQ ID N0:40: 

(i) SBQUENOS OOVRACIOaSTICS : 
(A) I£NGIH: 570 base pairs 
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(B) TYPE: nucleic acid 

(C) STKANDECKESS: double 

(D) TOPQLOGY: linear 

5 (ii) rClBCULE TYPE: dKA 

(iii) HYPCflHETICAL: NO 
(iv) ANTI-SENSE: NO 

10 

(vi) ORIGINAL SOURCE: 

(A) OBCPNl^t Pha£fia rbodbzyiiH 

(ix) FEKTORE: 
13 (A) NAME/KEY: CDS 

(B) UXRHC N: 49 . .501 

(D) OIHHl INPORMATXCN: /product- "recEKA73" 
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<xi) SEQUEI3CE DESGRIPnCN: SBQ ID N0:40: 

cn txrixxu s tcaaggcaaa ccncnGAAT ociciowct cmtcaac ats gga oga 57 

Met Gly Arg 

1 

GTC OaC ACT AAA ADC GTC AAG CX3A GCT TOS OCS^ GIG ATG ATC GAG AflG 105 
Val Arg Thr Lys Thr Val Lys Arg Ala Ser Arg Val Met lie Glu Lys 
5 10 15 

TTC TPiC OCT OGA CTC ACT CTT GRT TTC CAC ADC AAC AW3 OGA OTC GOC 153 
Phe Tyr Pro Arg Leu Thr Leu Asp Phe His Thr Asn Lys Arg lie Ala 
20 25 30 35 

GAC GAG C7IT GOC ATC ATC COC TOC AAG OSA CTT OGA AAC AM ATC OCT 201 
Asp Glu Val Ala lie lie Pro Ser Lys Arg Leu Axg Asn Lys lie Ala 
40 45 50 

GOG TTC ACT ADC CAC TIG ATG AAG OGAATCGAGAAGGGACaCGrTOGA 249 
Gly Phe Thr Ihr His Leu Met Lys Arg lie Gin Lys Gly Pro Vcd Arg 
55 60 65 

OCT ATC TOC TIC AftG CTTCAGGfiGGAGGAGOGAGfiGAGG AAG GAT CAG 297 
Gly lie Ser Phe Lys Leu Gin Glu Glu Glu Arg Glu Arg Lys Asp Gin 
70 75 80 

T3\C GTT OCT GAG GTC TOC GOC CTT GOC GOC OCT GAG CIG OCT TIG GAG 345 
lyr Val Pro Glu Val Ser Ala Leu Ala Ala Pro Glu Leu Gly Leu Glu 
65 90 95 

GIT GAC OOC GAC ADC AAG GAT CTT CTC OGA TOC CTT QGC ATG GAC TOC 393 
Val Asp Pro Asp Ihr Lys Asp Leu Leu Arg Ser Leu Gly Met Asp Ser 
100 105 110 115 

OTC AAC GTC C7V3 GIC TOC OCT CCT ATC TCT TOC TAG GOT GOC COC GAG 441 
lie Asn Val Gin Val Ser Ala Pro lie Ser Ser Ty^ Ala Ala Pro Glu 
120 125 130 

OGA GGT OCC OGA GGT GOC GGAOGANGTGGAOGAATCGrCCOC GGA GOT 489 
Arg Gly Pro Arg Gly Ala Gly Arg Xaa Gly Arg lie Val Pro Gly Ala 
135 140 145 

GOC OGA TAC TAAGTOnTT CTTCAADCAN GGGAlTVTnG AINATTOGCT 538 
Gly Arg Tyr 
150 

AOaCITGAAA ' mTi ' lTA TC ATTOITCCm TA 570 



(2) INTORMATICN FOR SEQ ID ND:41: 

70 
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(i) SEQUENCE CHnRACTERISnCS : 

(A) I^N^IH: 150 amino acids 

(B) TifPE: amino acid 
(D) TOPOUCXSy: linear 

i 

(ii) M3a[£CULE TYPE: protein 

(xi) SEQUENCE DESOilPnCN: SBQ ID N0:41: 

10 Met Gly Arg Val Arg Hir Lys Ihr Val Lys Arg Ala Ser Arg Val ^5et 
15 10 15 

He Glu Lys Phe lyr Pro Arg Leu Thr Leu Asp Phe His Ihr Asn Lys 
20 25 30 

IS 

Arg He Ala Asp Glu Val Ala He He Pro Ser Lys Arg Leu Arg Asn 
35 40 45 

Lys He Ala Gly Phe Ihr Ihr His Leu Met Lys Arg He Gin Lys Gly 
20 50 55 60 

Pro Val Arg Gly He Ser Phe Lys Leu Gin Glu Glu Glu Arg Glu Arg 
65 70 75 80 

25 Lys Asp Gin lyr Val Pro Glu Ved Ser Ala Leu Ala Ala Pro Glu Leu 
85 90 95 

Gly Leu Glu Veil Asp Pro Asp Ttu: Lys Asp Leu Leu Arg Ser Leu Gly 
100 105 110 

Met Asp Ser He Asn Val Gin Val Ser Ala Pro He Ser Ser Tyr Ala 
115 120 125 

Ala Pro Glu Arg Gly Pro Arg Gly Ala Gly Arg Xaa Gly Arg He Val 
35 130 135 140 

Pro Gly Ala Gly Arg Tyr 
145 150 



(2) MFCRWKITCN EO^ SEQ ID NO:42: 

(i) SEJQUEICE CHRRACIERISnCS : 

(A) LEN3IH: 373 base pairs 

(B) TVPE: nucleic acid 

(C) SHWNDEIXIESS: double 

(D) TOPQLOGy: linear 

(ii) f^CLEIJJLE TVPE: c£KA 

(iii) HyPOIHETICAL: NO 

(iv) ANIT -SENSE: NO 

(vi) ORIGINAL S0C2RCE: 

(A) CSRGANISM: Phaffia rhodozyma 

(ix) FEATURE: 

(A) NAMB/KE/: CDS 

(B) LXATICN: 13.. 324 

(D) OTHER lOTCRMATICN: /product^ "PRcIllA76" 



(xi) SEQUENCE DESCRimCN: SBQ ID NO:42: 

CCKTCATOCA ACATCCrrcrCAAACTCAfiGGOCAWSAlXQGrGrCGCjr 48 
Met Pro Pro Lys Val Lys Ala Lys Ihr Gly Val Gly 
15 10 

TO AAG ACC CW3 ABG AAG AAG AflG T3G TOC AAG GGA AAG GIG AfiG GAC AAG 96 
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Lys Hjt Gin Lys Lys Lys Lys Trp Ser Lys Gly Lys Val Lys Asp Lys 
15 20 25 

QOC GCX: CftC CRC GTC OTT GTT GAT CPG GCC ACT TAC GAC AAG ATC C?IT 144 
> Ala Ala His His Val Val Val Asp Gin Ala Thr Tyr Asp Lys lie Val 
30 35 40 

AflG &G GIC CX)C ACC m: AfiG TIG ATC TOC GAG TCT AIC TIG ATT GAC 192 
Lys Glu Veil Pro Ihr Tyr Lys Leu He Ser Gin Ser He Leu He Asp 
10 45 SO 55 60 

OGA CRC AAG GIT AAC GGT TOC GTC QOC CGA GCC OCT AIC CX3A CAC CTT 240 
Arg His Lys Val Asn Gly Ser Val Ala Arg Ala Ala He Arg His Leu 
65 70 75 

15 

GCCAAGa^aGATOCATCAAGAfiGATTGIlCCACCACAACQGACAGlOG 288 
Ala Lys Glu Gly Ser He Lys Lys He Val His His Asn Gly Gin Trp 
80 85 90 

30 

ATC TAC ACC OGA GOC ACT GCC OCT OCT GAC GCA TAAATCIGKT GGAnTCATC 341 
He Tyr Ihr Arg Ala Thr Ala Ala Pro Asp Ala 
95 100 

23 GATCTIGAAA AAXAAAAAAA AAAAAAAAAA AA 373 

(2) INFCRMATICN FOR SEQ ID ND:43: 

30 (i) SE9QUEI9CE CHARAL'iUilSTICS : 

(A) I.£N?IH: 103 amino acids 

(B) T^^: amino acid 
(D) TDFOLGGV: linear 

35 (ii) ^DI£)CUI£ TyPE: protein 

(xi) SBQUEliCE DeSO^PTICII: SEQ ID ND:43: 

Met Pro Pro Lys Val Lys Ala Lys Thr Gly Val Gly Lys Thr Gin Lys 
40 1 5 10 15 

Lys Lys Lys Trp Ser Lys Gly Lys Val Lys Asp Lys Ala Ala His His 
20 25 30 

45 Val Val Val Asp Gin Ala Thr Tyr Asp Lys He Val Lys Glu Val Pro 
35 40 45 

Thr Tyr Lys Leu He Ser Gin Ser He Leu He Asp Arg His Lys Val 
50 55 60 

so 

Asn Gly Ser Val Ala Arg Ala Ala He Arg His Leu Ala Lys Glu Gly 
65 70 75 80 

Ser He Lys Lys He Val His His Asn Gly Gin Trp He Tyr Thr Arg 
55 85 90 95 

Ala Thr Ala Ala Pro Asp Ala 
100 

(2) IITORWITICN PGR SBQ ID N0:44: 

(i) SBQUEZVCS CHARACTESOSTICS : 
.(A) l^UIR: 514 base pairs 
65 (B) TYPE: nucleic acid 

(C) STRAKDEIXIESS: double 

(D) TDECELOGY: linear 

(ii) M3I£CUI£ TYPE: 
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(iii) HYParHETICRL: NO 

(iv) ANTI-SEUSE: NO 

s (vi) GEaGINAL source: : 

(A) OWaNI^I: Phaffia rtoaozyma 

iix) FEATURE: 

<A) NfiME/KEY: CDS 
10 (B) IXXMTCW: 13.. 435 

(D) OTHER INPORMRTICN: /product= "PRcCNA78" 



(xi) SECfJEtXS DESCRIPnCN: SBQ ID N0:44: 

ts 

AAAAAAGCX3V AT AIG CIT ATC TCT AAA CAG AAC AGG AGG QCC AIC TTC 48 
Met Leu He Ser Lys Gin Asn Arg Arg Ala He Ete 
15 10 

20 GAG AAC CTC TIC AM GRG GGA GIT QOC CnC QCE G(X AfiG GAC TTC 96 
Glu Asn Leu Phe Lys Glu Gly Val Ala Val Ala Ala Lys Asp Phe Asn 
15 20 25 

OCT GOC ACC CAC OOC GfiG ATT GM GGTr GIC TOC AAC CTT GAG GIC AIC 144 
25 Ala Ala Thr His Pro Glu He Glu Gly Val Ser Asn Leu Glu Val He 
30 35 40 

AAG GOC ATG GAG TOT TIG AOC TOC AAG QGA TAC GIG AAG AOC CM TIC 192 
Lys Ala Met Gin Ser Leu Thr Ser Lys Gly Tyr Val Lys Hir Gin Etie 
» 45 50 55 60 

TOG TC3G CAG TPC TOT TAC TAC AOC CIC AOC OCT GAG GGT CIT GAC TAC 240 
Ser Trp Gin Tyr Tyr Tyr Tyr Thr Leu Tbr Pro Glu Gly Leu Asp Tyr 

65 70 75 

35 

CIC OGA GAG TTC CIC CAC CTT OOC TOC GAG ATT GIC COC AAC ACT CIC 288 
Leu Arg Glu Phe Leu His Leu Pro Ser Glu He Val Pro Asn Thr Leu 
80 85 90 

40 AAG OGA OOC ADC OGA OCT GOC A;^ GOC CAG GGT OOC GGA GGT GOC TAC 336 
Lys Arg Pro Thr Arg Pro Ala Lys Ala Gin Gly Pro Gly Gly Ala Tyr 
95 100 105 

OGA GCT OOC OGA GOC GAG GGT GOC GGT OGA OGA GAG TAC OGA OGA OSA 384 
43 Arg Ala Pro Arg Ala Glu Gly Ala Gly Arg Gly Glu Tyr Arg Arg Arg 
110 115 120 

GAG GAC GGT GOC GGT GOC TTC GGT GOCGGrOGAGGraGAOOCOGAGCT 432 
Glu Asp Gly Ala Gly Ala Phe Gly Ala Gly Arg Gly Gly Pro Arg Ala 
50 125 UO 135 140 

TAAATCOCAG AGCnTICIT TTTGrOGriG CIQGGACrAT QQCATtMGA GCIGGCTIGC 492 

AGAAAAAAAA AAAAAAAAAA AA 514 

a 

(2) INFCRMAHCN for SEQ id ND:45: 

(i) SB^fJSNCE CHARAdHUSnCS : 
CO (A) I£2XnH: 140 amino acids 

(B) TVPE: amino acid 
(D) TOVCSUXTf: linear 
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(ii) MXEmrr: type: protein 

(xi) SBQUE2XS DESCRIFTXGM: SEQ ID N0:45: 

Met Leu He Ser Lys Gin Asn Arg Arg Ala He Phe Glu Asn Leu Phe 
15 10 15 
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Lys Glu Gly Val Ala Val Ala Ala Lys Asp Phe Asn Ala Ala Ihr His 
20 25 30 

Pro Glu lie Glu Gly Val Ser Asn Leu Glu Val He Lys Ala Nfet Gin 
3 35 40 45 

Ser Leu Tta Ser Lys Gly lyr Val Lys Ihr Gin Phe Ser Trp Gin Tyx 
50 55 60 

10 Tyr Tyr Tyr Thr Leu Ihr Pro Glu Gly Leu Asp Tyr Leu Arg Glu E*ie 
65 70 75 80 

Leu His Leu Pro Ser Glu He Val Pro Asn Thr Leu Lys Arg Pro Ttir 
85 90 95 

15 

Arg Pro Ala Lys Ala Gin Gly Pro Gly Gly Ala Tyr Aig Ala Pro Arg 
100 105 110 

Ala Glu Gly Ala Gly Arg Gly Glu Tyr Arg Arg Arg Glu Asp Gly Ala 
20 115 120 125 

Gly Ala Phe Gly Ala Gly Arg Gly Gly Pro Arg Ala 
130 135 140 

25 

(2) INFOBMKriW FOR SBQ ID ND:46: 

(i) SE3QUENC3S CHARACIERISTICS : 
(A) LENGTIH: 437 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDECX^ESS: (double 

(D) TDPCJajOGY: linear 

(ii) NOrrnTTiK TVPS: dS^ 

35 

(iii) HYFOIHEnCAL: NO 

(iv) AOTI-SEN5E: NO 

40 (vi) ORIGINAL SOURCE: 

(A) OSiGRNISM: Phaffia rhociozyma 

(ix) FKAHIRE: 

(A) NfiME/KEY: ODS 
45 (B) WCKTICtJ: 30.. 308 

(D) CTIHER INPORMAnCN: /product= "PRcCNASS" 

(xi) SECFJEtfCS CGSCRXFTICN: SSQ ID ND:46: 

50 

CT00CTC3«G AAATCAflraV OOGGACRIC ATO TOC AAG GGA AOC AAG AAA GTT 53 

Met Ser Lys Arg Thr Lys Lys Val 
1 5 

55 GGA ATC AOC GGA AAG TAC GGA GTC OGA TPC GGA GCT TCC CTC OGA AAG 101 
Gly He Thr Gly Lys Tyr Gly Val Arg Tyr Gly Ala Ser Leu Arg Lys 
10 15 20 
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AOC GTC AAG AM NTG GfG GTC T3G CPG CAC GGT ACC TPC ADC TCT GAC 149 
Thr Val Lys Lys Xaa Glu Val Trp Gin His Gly Thr Tyr Thr Cys Asp 
25 30 35 40 

TTC TGC GGA AAG GAC QOC GIC AAG OSA AOC OCT GIT GGT ATC TOG AAG 197 
Phe Cys Gly Lys Asp Ala Val Lys Arg Thr Ala Val Gly lie Trp Lys 
45 50 55 

T3C OGA GGA TGC OSA AAG AOC AOC GOC (XST OCT GOT T3G CAG CIT CK3 245 
Cys Arg Gly Cys Arg Lys Vnr Thr Ala Gly Gly Ala Trp Gin Leu Gin 
60 65 70 
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ADC ACT GOC OCT CTC ADC GIC AAG TDC ADC ACT OSA OGA CTC OGA GAG 293 
Thr Ihr Ala Ala Leu Utir Val Lys Ser Thr Thr Arg Arg Leu Arg Glu 
75 80 85 

5 CTC AAG GAG CTT lAAATIGAAT TdGCRCMA GAiaAAACTG TK30GQQ03G 345 
Leu Lys Glu Val 
90 

GAGAGAGIGG ATICRTICrr TITnnXIIA GATCTSAW3G GATC30CATGT CMDOCTTIC 405 

10 

anOOOCAAA AAAAAAAAAA AAAAAAAAAA AA 437 

(2) INFOPMATICN FGR SEO id N0:47: 

(i) SBJJESfSCE QiAKACIHlXSnCS : 

(A) I^KHH: 92 amino acids 

(B) TYPE: andno acid 
(D) TDPOliDGY: linear 

(ii) MOIHIXEJE TTPE: protein 

(xi) SBCPEKX: rSSCRI^nOtJ: SEQ ID NO: 47: 

25 Met Ser Lys Arg Thr Lys Lys Val Gly He "nir Gly Lys lyr Gly Val 
15 10 15 

Arg lyr Gly Ala Ser Leu Arg Lys Thr Val Lys Lys Xaa Glu Val Trp 
20 25 30 

30 

Gin His Gly Thr Tyr Thr Cys Asp Phe Cys Gly Lys Asp Ala Val Lys 
35 40 45 

Arg Thr Ala Val Gly He Tcp Lys Cys Arg Gly Cys Arg Lys Thr Thr 
35 50 55 60 

Ala Gly Gly Ala Trp Gin Leu Gin Thr Thr Ala Ala Leu Thr Val Lys 
65 70 75 80 

40 Ser Thr Thr Arg Arg Leu Arg Glu Leu Lys Glu Val 
85 90 

(2) INFOI^MATIGM FOR SBQ ID MD:48: 

45 

(i) SEQUQICE GHARACIQ^ISnCS : 

(A) I£NaTH: 509 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOECMESS : double 
50 (D) TDPCELDGy; linear 

(ii) M3LBCUIE TYPE: cSJtSi 

(iii) HYPOraBnCMj: NO 

5} 

(iv) AWn -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) GRGANI94: Phaffia rhodozyma 

60 

(ix) FEATCRE: 

(A) NAME/KEY: CDS 

(B) LXATTCN: 35.. 400 

(D) OIHER INFCRMATICN: /products "roc£NAB7" 

(iS 

<xi} SEQUENCE DESdaPTICN: SEQ ID ND:48: 

GGAAGACXrrc ACAGCAW3AC TAAGAdCTC AAAC AIG OCT ADC AAG AOC GGC 52 
w Met Ala Thr Lys Thr Gly 
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AftG ACT OGA TOC GCT CIC CfiG GftC GTC CJTT ACT CX3G GAG TAC AOC ATC 100 
Lys Ihr Arg Ser Ala Leu Gin Asp Val Val Thr Arg Glu T/r Ihr He 
10 15 20 

CAC CIC CAC ARG TAC GTT CAC QGA AGGTCTTTCAAGAAGOGAGCTCXrr 148 
His Leu His Lys Tyr Veil His Gly Arg Ser Phe Lys Lys Arg Ala Pro 
25 30 35 

T3G GCr C?rC AAG TCC ATC CfiG GAG TIT GCT CTC AAG TCX3 ATS GGA AOC 196 
Trp Ala Val Lys Ser He Gin Glu Phe Ala Leu Lys Ser Met Gly Thr 
40 45 50 



CGA GAT CTC OGA ATT GAC CCT AM3 TIG AAC CPG GCC GIC TOG QGA CRG 244 
Arg Asp Val Arg He Asp Pro Lys Leu Asn Gin Ala Val Trp Gly Gin 
55 60 65 70 

QGT CTC AAG AAC OOC OOC AAG OGA CTC OSA ATC OGA CTT GAG OGA AAG 292 
Gly Val Lys Asn Pro Pro Lys Arg Leu Arg He Arg Leu Glu Arg Lys 
75 80 85 

OGA AAC GAC GAG GAG GAT GCT AAG GAC AAG CTC TAC ACT CTT GCT AOC 340 
Arg Asn Asp Glu Glu Asp Ala Lys Asp Lys Leu lyr Thr Leu Ala Thr 
90 95 100 

CTC GTC OOC QGA GIC AOC AAC TTC AAG OCT CTC CAA ADC GIT GIC GTT 3B8 
Val Val Pro Gly Val Thr Asn Phe Lys Gly Leu Gin Thr Val Val Val 
105 110 115 

GAC ADC GAG TAATnTCTC TIQGAnTrC ATGADGGTOG ATTCAGCICT 437 
Asp Thr Glu 
120 

TTCnGGOGC CAndTCrr ATSCACrCro ATOOCmCA OGAOOCNnT TINrnCINA 497 
13;AATAAAAA AA 509 

(2) INRDraVKITCN FOR SB2 ID N0:49: 

(i) SEQUENCE CHARACHRISTTCS : 

(A) LENJIH: 121 amino acids 

(B) TYPE: amino acid 
(D) TDPCajOGy: linear 

(ii) MXEdJLE TYPE: protein 

(xi) SEQUENCE DESCRXPTICN: SEQ ID N0:49: 

Met Ala Thr Lys Thr Gly Lys Thr Arg Ser Ala Leu Gin Asp Val Val 
15 10 15 

Thr Arg Glu Tyr Thr He His Leu His Lys Tyr Val His Gly Arg Ser 
20 25 30 

Phe Lys Lys Arg Ala Pro Trp Ala Val Lys Ser He Gin Glu Phe Ala 
35 40 45 

Leu Lys Ser Met Gly Thr Arg Asp Val Arg He Asp Pro Lys Leu Asn 
50 55 60 

Gin Ala Val Trp Gly Gin Gly Val Lys Asn Pro Pro Lys Arg Leu Arg 
65 70 75 80 

He Arg Leu Glu Arg Lys Arg Asn Asp Glu Glu Asp Ala Lys Asp Lys 
85 90 95 
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Leu Tyr Thr Leu Ala Thr Val Val Pro Gly Val ihr Asn Phe Lys Gly 
100 105 110 

Leu Gin Thr Val Val Val Asp Ihr Glu 
5 115 120 



(2) INECRMMnCN FOR SEQ ID N0:50: 

(i) SEQUENCE CHARACI^KISniCS : 

(A) IfHSriH: 542 base pairs 

(B) TVPE: nucleic acid 

(C) glKCTfFTWFSfi : double 

(D) TDPCELOSY: linear 

(ii) M3LECULE TYPE: cENA 

(iii) lOTOIHmCaL: ND 

(iv) AOTI-SQBE: NO 

(vi) OS^GHIAL SOURCE: 

(A) ORGANISM: Phaffia rbsdozyina 

(ix) FEATORE: 

(A) NW^/KEY: CDS 

(B) LDCSOICN: 18.. 443 

(D) OIHER 2NF0KMRTICN: /product= "PRcENA95" 

(Xi) SESQUaJCE EESCRIPnCN: SBQ ID IO:50: 

ACTOQCITaA CATCAfiG ATO TOC GIC GCT GTC CM ACT TIC OCT AW3 AAG 50 
Met Ser Val Ala Val Gin Ttir Phe Gly Lys Lys 
15 10 

AAQ ACT GCC AGC GCT GIG GOC CAC GOC AOC CCT GGC CX3A GGT CTC ATC 98 
Lys Thr Ala Thr Ala Val Ala His Ala Thr Pro Gly Arg Gly Leu He 
15 20 25 

OGA err AAC GGA ORG OCT ATC TCA CTT GCC GAG OCT OCT CIC CTC OCaA 146 
Arg Leu Asn Gly Gin Pro He Ser Leu Ala Glu Pro Ala Leu Leu Arg 
30 35 40 

TAG AAG TAC TAC GAG CCT ATC CTC GTC ATC GGA GCT GAG AAG ATC AAC 194 
Tyr Lys Tyr Tyr Glu Pro He Leu Val He Gly Ala Glu Lys He Asn 
45 50 55 

CPG ATC GAG ATC CGA CIC AAG GTC AAG GGT GGA GGA CAC GTC TCC CAG 242 
Gin He Asp He Arg Leu Lys Val Lys Gly Gly Gly His Val Ser Gin 
60 65 70 75 

GIGTACQCCGICaGACAGGaCATCGGrAAGGOCATCGrcGCrTACTAC 290 
Val Tyr Ala Val Arg Gin Ala He Gly Lys Ala He Val Ala Tyr Tyr 
80 65 90 

GCTAAGAACGrcGATGOCGCCTCTQaCCrcGAGATCAAGAAGGCTCrC 338 
Ala Lys Asn Val Asp Ala Ala Ser Ala Leu Glu He Lys Lys Ala Leu 
95 100 105 

GTC GCC TAC GAC CGA ACC CTC CTC ATC GCC GAT OOC OGA OGA ATG GAG 386 
Val Ala Tyr Asp Arg Thr Leu Leu He Ala Asp Pro Arg Arg Met Glu 
110 115 120 

OOC AAG AAG TIC GGA GGA OOC GGA GOC OGA GCC OGA GTC CAG AAG TCT 434 
Pro Lys Lys Phe Gly Gly Pro Gly Ala Arg Ala Arg Val Gin Lys Ser 
125 130 135 

TAC OGA TTtftfAAOTGr TIGTCrTGIG GrCiaX33QG TCATCTATCC AACATCnTG 490 
Tyr Arg 
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140 

GAAAANRNTT (JTYTCXXSTCk TAIGTCKEGC C ' iVm ' A TGG AAAAAAAAAA AA 542 

5 

(2) INK3F?MAnCN K3R SBQ ID N0:51; 

(1) SEQUE30; CHARAdOaSTICS: 

(A) I£NI7ZH: 141 amino acids 

10 (B) TYPE: amino acid 
(D) TOPOUDGY: linear 

(ii) M3L£CUI£ TVEE: protein 

15 (xi) SEQUENCE DESOaPTICN: SBQ ID N0:51: 

Met Ser Val Ala Val Gin Thr PhB Gly Lys Lys Lys Thr Ala Thr Ala 
15 10 15 

20 Val Ala His Ala Ttir Pro Gly Arg Gly Leu lie Arg Leu Asn Gly Gin 
20 25 30 

Pro lie Ser Leu Ala Glu Pro Ala Leu Leu Arg Tyr Lys Tyr Tyr Glu 
35 40 45 

23 

Pro lie Leu Val lie Gly Ala Glu Lys lie Asn Gin He Asp He Arg 
50 55 60 

Leu Lys Val Lys Gly Gly Gly His Val Ser Gin Val Tyr Ala Val Arg 
30 65 70 75 80 

Gin Ala He Gly Lys Ala He Val Ala Tyr lyr Ala Lys Asn Val A^ 
85 90 95 

11 Ala Ala Ser Ala Leu Glu He Lys Lys Ala Leu Val Ala Tyr Asp Arg 

100 105 110 

Thr Leu Leu He Ala Asp Pro Arg Aig Met Glu Pro Lys Lys Phe Gly 
lis 120 125 

40 

Gly Pro Gly Ala Arg Ala Arg Val Gin Lys Ser Tyr Arg 
130 135 140 
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Claims 

1. Recombinant DNA comprising a transcription promoter and a downstream sequence to be 
expressed, in operable linkage therewith, 

5 wherein the transcription promoter comprises a region found upstream of the open reading 

frame of a highly expressed Phqffla gene. 

2. Recombinant DNA according to claim 1, wherein said highly expressed Phaffta gene is a 
glycolytic pathway gene. 

10 

3. Recombinant DNA according to claim 2, wherein said glycolytic pathway gene is a gene coding 
for Glyceraldehyde-3-Phosphate Dehydrogenase. 

4. Recombinant DNA according to claim I, wherein said highly expressed Phaffia gene is a 
15 ribosomal protein encoding gene. 

5. Recombinant DNA comprising a transcription promoter and a downstream sequence to be 
expressed, in operable linkage therewith, 

wherein the transcription promoter comprises a region found upstream of the open reading 
20 frame encoding a protein as represented by one of the amino acid sequences depicted in any one of 
SEQIDNOs: 24 to 50. 

6. A recombinant DNA according to any one of the preceding claims, wherein said downstream 
sequence to be expressed is heterologous with respect to the transcription promoter sequence. 

25 

7. A recombinant DNA according to any one of claims 1 to 6, wherein the downstream sequence 
comprises an open reading frame coding for a polypeptide responsible for reduced sensitivity against a 
selective agent. 

30 8. A recombinant DNA according to claim 7, wherein said selective agent is G4I8. 

9. A recombinant DNA according to any one of claims 1 to 6, wherein the said downstream 
sequence to be expressed codes for an enzyme involved in the carotenoid biosynthesis pathway. 

}5 10. A recombinant DNA according to claim 9, wherein said downstream sequence to be expressed 
encodes an enzyme having an activity selected from the group consisting of isopentenyl pyrophosphate 
isomcrase, gcranylgeranyl pyrophosphate synthase, phytoene synthase, phytoene desaturase. and lycopene 
cyclase. 
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11. A recombinant DNA according to claim 10, wherein said d wnstream sequence to be expressed 
encodes an enzyme having an amino acid sequence selected from the one represented by SEQIDNO: 13, 
SEQIDNO: 15, SEQIDNO: 17, SEQIDNO: 19, SEQIDNO: 21 or SEQIDNO: 23. 

5 12. A recombinant DNA according to any one of the preceding claims, wherein said recombinant 
DNA comprises further a transcription terminator downstream from the said DNA sequence to be 
expressed, in operable linkage therewith. 

13. A recombinant DNA according to claim 12, wherein the terminator is a GAPDH-encoding gene 
10 terminator fragment. 

14, A recombinant DNA according to any one of the preceding claims, wherein the recombinant 
DNA is in the form of a vector capable of replication and/or integration in a host organism. 

IS 15. A recombinant DNA according to claim 14, fiinher comprising Phaffla ribosomal RNA 
encoding DNA. 

16. A recombinant DNA according to claim 15, which is linearised by cleaving inside the Phaffla 
ribosomal RNA encoding DNA portion. 

20 

17. A microorganism harbouring a recombinant DNA according to any one of the preceding claims. 

18. A microorganism according to claim 17, which is Phaffla rhodozyma. 

25 19. A microorganism according to claim 18, having the recombinant DNA integrated into its 
genome in an amount of 50 copies or more. 

20. An isolated DNA fragment comprising a Phaffla GAPDH-gene, or a functional fragment 
thereof 

30 

21 . Use of a functional fragment according to claim 20 for making a recombinant DNA construct. 

22. The use according to claim 21, wherein said fragment is a regulatory region normally located 
upstream or downstream of the open reading frame coding for GAPDH in Phaffla rhodozyma. 

35 

23. A method for obtaining a transformed Phaffla strain, comprising the steps of 

(a) contacting cells or protoplasts of a Phc^ia strain with recombinant DNA under conditions 
conducive to uptake thereof. 
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said recombinant DNA comprising a transcription pronnoter and a downstream sequence to be 
expressed in operable linkage therewith, 

(b) identifying Phqffia rhodozyma cells or protoplasts having obtained the said recombinant 
DNA in expressible form, 
5 wherein the recombinant DNA is one according to any one of the preceding claims. 

24. A method according to claim 23» comprising the additional step of providing an electropulse 
after contacting of Phaffia cells or protoplasts with the said recombinant DNA. 

10 25. A transformed Phqffia strain obtainable by a method according to any one of the preceding 
claims, said snrain, upon cultivation, being capable of expression of the said downstream sequence, as a 
consequence of transformation with the said recombinant DNA. 

26. A transfomed Phaffla strain according to claim 25, wherein the said downstream sequence 
\i codes for a pharmaceutical protein. 

27. A transformed Phqffia strain according to any one of claims 24 to 26, wherein the said Phqffta 
strain contains at least 10, preferably at least 50, copies of the said recombinant DNA integrated into its 
genome. 

20 

28. An isolated DNA sequence coding for an enzyme involved in the carotenoid biosynthetic 
pathway of Phaffia rhodozyma. 

29. An isolated DNA sequence according to claim 28, wherein said enzyme has an activity selected 
23 from isopentenyl pyrophosphate isomerase activity, geranylgeranyl pyrophosphate synthase activity, 

phytoene synthase activity, phytoene desaturase activity and lycopene cyclase activity. 

30. An isolated DNA sequence coding for an enzyme having an amino acid sequence selected from 
the one represented by SEQIDNO: 13, SEQIDNO: 15, SEQIDNO: 17, SEQIDNO: 19, SEQIDNO: 21 or 

30 SEQIDNO: 23. 

31. An isolated DNA sequence coding for a variant of an enzyme according to claim 30, said 
variant being selected from (i) an allelic variant, (ii) an enzyme having one or more amino acid 
additions, deletions and/or subsitutions and still having the stated enzymatic activity. 

is 

32. An isolated DNA sequence encoding an enzyme involved in the carotenoid biosynthesis pathway 
selected from: 

(i) a DNA sequence as represented in SEQIDNO: 12, SEQIDNO: 14, SEQIDNO: 16 SEQIDNO: 18; 
SEQIDNO: 20, r SEQIDNO: 22, 
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(ii) an isocoding variant of the DNA sequence represented in SEQIDNO: 12, SEQIDNO: 14, SEQIDNO: 
16. SEQIDNO: 18, SEQIDNO: 20 or SEQIDNO: 22; 

(iii) an allelic variant of a DNA sequence as represented in SEQIDNO: 12, SEQIDNO: 14, SEQIDNO: 
16, SEQIDNO: 18; SEQIDNO: 20 or SEQIDNO: 22; 

5 (iv) a DNA sequence capable, vk^hen bound to nitrocellulose filter and after incubation under hybridising 
conditions and subsequent washing, of specifically hybridising to a radio-labelled DNA fragment having 
the sequence represented in SEQIDNO: 12, SEQIDNO: 14, SEQIDNO: 16, SEQIDNO: 18, SEQIDNO: 
20 or SEQIDNO: 22, as detectable by autoradiography of the filter after incubation and washing, 
wherein said incubation under hybridising conditions and subsequent washing is performed by incubating 

10 the filter-bound DNA at a temperature of at least 50*>C, preferably at least in the presence of a 

solution of the said radio-labeled DNA in 0.3 M NaCl, 40 mM Tris-HCl, 2 mM EDTA, 0.1% SDS, pH 
7.8 for at least one hour, whereafter the filter is washed at least twice for about 20 minutes in 0.3 M 
NaCI, 40 mM Tris-HCl, 2 mM EDTA, 0.1% SDS, pH 7.8, at a temperature of 50«C, preferably at least 
55°C, prior to autoradiography. 

$ 

33. Recombinant DNA comprising an isolated DNA sequence according to any one of claims 27 to 
32. 



34. Recombinant DNA according to claim 33, wherein said isolated DNA sequence is operably 
linked to a transcription promoter capable of being expressed in a suitable host, said isolated DNA 
sequence optionally being linked also to a transcription terminator fiinctional in the said host. 

35. Recombinant DNA according to claim 34, wherein said host is a Phaffia strain. 

36. Recombinant DNA according to any one of claims 33 to 35, wherein the transcription promoter 
is from a glycolytic pathway gene present in Phaffia. 

37. Recombinant DNA according to claim 36, wherein said glycolytic pathway gene is a gene 
coding for Glyceraldehyde-3-Phosphate Dehydrogenase. 

38. Recombinant DNA according to any one of claims 33 to 35, wherein the transcription promoter 
is from a ri bosom al protein encoding gene. 

39. Recombinant DNA according to any one of claims 33 to 35, wherein the transcription promoter 
comprises a region found upstream of the open reading frame encoding a protein as represented by one 
of the amino acid sequences depicted in any one of SEQIDNOs: 24 to 50. 
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40. Recombinant DNA according to any one of claims 27 lo 39, wherein said recombinant DNA 
comprises further a transcription terminator downstream from the said heter logous DNA sequence to be 
expressed, in operable linkage therewith, which terminator is a Phaffia transcription terminator. 

5 41 , Recombinant DNA according to any one of claims 27 to 40, being in the form of a vector. 

42. Use of a vector according lo claim 41 to transform a host. 

43. Use according to claim 19, wherein the host is a Phaffia strain. 

10 

44. A host obtainable by transformation, optionally of an ancestor, using a recombinant DNA 
according to any one of claims 27 to 41. 

45. A host according to claim 44, which is a Phaffia strain, preferably a Phaffia rhodozyma strain. 

ts 

46. A transformed Phaffia rhodozyma strain which is capable of overexpressing a DNA sequence 
encoding an enzyme involved in the carotenoid biosynthesis pathway. 

47. A transformed Phqffta rhodozyma strain according to claim 46, which produces inreased 
20 amounts of astaxanthin relative to its untransformed ancestor. 

48. A method for producing an enzyme involved in the carotenoid biosynthesis pathway, by 
culturing a host according to claim 44 or 45, under conditions conducive to the production of said 
enzyme. 

:j 

49. A method for producing a carotenoid, characterised in that a host according to any one of 
claims 44 to 47 is cultivated under conditions conducive to the production of the carotenoid. 

50. A method according to claim 49, wherein the carotenoid is astaxanthin. 

30 

51. A method for producing a pharmaceutical protein by culturing a transformed Phaffia strain 
according to claim 26 under conditions conducive to the production of the said protein. 

52. A method for the isolation of a promoter from a highly expressed gene in Phaffia, comprising 
35 the steps of: 

(a) making a cDNA library on mRNA isolated from a Phaffia strain grown under desired conditions; 

(b) determining (part of) the nucleotide sequence of the (partial) cDNAs obtained in step (a); 

(c) comparing the obtained sequence data in step (b) to known sequence data; 
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(d) cloning amplifying putative prom ter fragments of the gene located either directly upstream of the 
open reading frame or directly upstream of the transcription start site of the gene corresponding to the 
expressed cDNA, and 

(e) verifying whether the promoter sequences obtained give high-level expression in a Phaffia strain, by 
5 expressing a suitable marker under the control of the promoter in a transformed Phqffia strain. 
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1 . Recombinant DNA comprising a transcription promoter and downstream region to 
be expressed where the transcription promoter comprises a region found 
upstream of a highly expressed Phaffia gene, method of transforming a Phaffia 
strain where the transcription promoter is from a glycolytic pathway gene, to 
express a downstream sequence, recombinant DNA thereof, including a selective 
agent and the transformed P haffia strains : Claims 2. 3. 13, 36 and 37 (com- 
pleteiy} and Claims 1 . 6 to 14. 1 7 to 19. 22 to 27. 33 to 35 and 40 to 45 and SI 
{partially}. 

2. Recombinant DNA comprising a transcription promoter and downstream region to 
be expressed where the transcription promoter comprises a region found 
upstream of a highly expressed Phaffia gene, method of transforming a Phaffia 
strain where the transcription promoter is from a ribosonnal protein, to express a 
downstream sequence, recombinant DNA thereof and the transformed Phaffia 
strains: Claims 4, 5. 15. 16. 38 and 39 {completely} and Claims 1. 6 to 12. 14. 17 
to 1 9. 22 to 27 33 to 35 and 40 to 45 and 51 {partially}. 

3. An isolated DNA fragment comprising a Phaffia GAPDH-gene and use in the 
construction of a DNA constoict: Claims 20 to 21 {completely} and ClaillL22 
{partially}. 

4. An isolated DNA sequence coding for an enzyme involved in the carotenoid 
biosynthetic pathway of Phaffia rhodozyme and recombinant DMA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA : Claimsl.6- 9 to 12. 14. 17 to 19. 23to 
27. 28 to 35 and 40 to 50 {partially} 

5. An isolated DNA sequence coding for an enzyme involved in the carotenoid 
biosynthetic pathway of Phaffia rhodozyme, and recombinant DNA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA, where the enzyme has is p ntenyl pyro- 
phosphate isomerase activity : Claims i. 6. 9 m 12. 14. 17 to 19. 23 to 27, 26 

to 35 and 40 to 50 (partially} / 
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6. An isolated DNA sequence coding for an enzyme involved in the carotenold 
biosynthetic pathway of Phaffia rhodozyme, and recombinant DNA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA, where the enzyme has geranylgeranyl 
pyrophosphate synthase activity : Claims 1. 6. ^ tn 12. 14. 17 to 19. 23 to 27. 
pfl tn and 40 to 50 {partially} 

7. An isolated DNA sequence coding for an enzyme involved in the carotenoid 
biosynthetic pathway of Phaffia rhodozyme and recombinant DNA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA, where the enzyme has phytoene sythase 
activity : niaimfi i 6. 9 to 12. 14. 17 to 19. 23 to 27. 28 to 35 and 40 to 50 
{partially} 

8. An isolated DNA sequence coding for an enzyme involved in the carotenoid 
biosynthetic pathway of Phaffia rhodozyme and recombinant DNA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA, where the enzyme has phytoene 
desaturase activity ■ Claims 1. 6. 9 to 12. 14. 17 to 19. 23 to 27, 2B to 35 and 40 
to 50 {partially} 

9. An isolated DNA sequence coding for an enzyme involved in the carotenoid 
biosynthetic pathway of Phaffia rhodozyme and recombinant DNA comprising a 
transcription promoter and the downstream region to be expressed codes for an 
enzyme involved in the carotenoid biosynthesis pathway and the transformed 
Phaffia strains comprising said DNA where the enzyme has lycopene cyclase 
activity : niatms 1. 6. 9 to 12. 14. 17 to 19. 23 to 27. 28 to 35 and 40 to 50 
{partially} 



10. 



Method for the isolation of a promot r from a gene expressed in Phaffia : 
ClaiiiLSZ {completely} 
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